CUMFREQ
Overview
The CUMFREQ
function computes the cumulative frequency histogram of the input data by dividing it into a specified number of bins and counting the cumulative number of observations up to each bin. This is useful for understanding the distribution of values in a dataset and for identifying percentiles or cumulative counts across ranges. It is a wrapper around SciPy’s scipy.stats.cumfreq
function.
This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=CUMFREQ(data, numbins, [lowerlimit], [upperlimit])
data
(2D list or scalar, required): Input data to analyze, as a column vector or a single value.numbins
(integer, required): Number of bins to use for the histogram.lowerlimit
(float, optional): Lower bound for the histogram range.upperlimit
(float, optional): Upper bound for the histogram range.
The function returns a single column of cumulative counts as a 2D list, where each row corresponds to a bin.
Examples
Example 1: Default range
Inputs:
data | numbins |
---|---|
1 | 4 |
2 | |
3 | |
4 |
Excel formula:
=CUMFREQ({1;2;3;4}, 4)
Expected output:
Result |
---|
1 |
2 |
3 |
4 |
Example 2: Duplicate values
Inputs:
data | numbins |
---|---|
1 | 2 |
1 | |
2 | |
2 |
Excel formula:
=CUMFREQ({1;1;2;2}, 2)
Expected output:
Result |
---|
2 |
4 |
Example 3: More bins
Inputs:
data | numbins |
---|---|
1 | 5 |
2 | |
3 | |
4 | |
5 |
Excel formula:
=CUMFREQ({1;2;3;4;5}, 5)
Expected output:
Result |
---|
1 |
2 |
3 |
4 |
5 |
Example 4: Custom range
Inputs:
data | numbins | lowerlimit | upperlimit |
---|---|---|---|
1 | 2 | 0 | 4 |
2 | |||
3 |
Excel formula:
=CUMFREQ({1;2;3}, 2, 0, 4)
Expected output:
Result |
---|
1 |
3 |
Python Code
from scipy.stats import cumfreq as _cumfreq
def cumfreq(data, numbins=10, lowerlimit=None, upperlimit=None):
"""
Returns the cumulative frequency histogram for the input data.
Args:
data: 2D list or scalar of float values. Input data to analyze.
numbins: Number of bins to use for the histogram (default 10).
lowerlimit: Lower bound for the histogram range (optional).
upperlimit: Upper bound for the histogram range (optional).
Returns:
2D list of cumulative counts for each bin, or an error message as a string.
This example function is provided as-is without any representation of accuracy.
"""
# Validate data
try:
if not isinstance(data, list):
flat = [float(data)]
else:
if all(isinstance(row, list) and len(row) >= 1 for row in data):
flat = [float(row[0]) for row in data]
else:
return "Invalid input: data must be a 2D list or scalar."
except Exception:
return "Invalid input: data must be numeric."
# Require at least two data points
if len(flat) < 2:
return "Invalid input: data must contain at least two values."
# Validate numbins
try:
nb = int(numbins)
if nb <= 0:
return "Invalid input: numbins must be a positive integer."
except Exception:
return "Invalid input: numbins must be an integer."
# Validate limits
drl = None
if lowerlimit is not None or upperlimit is not None:
if lowerlimit is None or upperlimit is None:
return "Invalid input: both lowerlimit and upperlimit must be provided."
try:
low = float(lowerlimit)
up = float(upperlimit)
except Exception:
return "Invalid input: limits must be numeric."
drl = (low, up)
# Compute cumulative frequency
try:
if drl:
res = _cumfreq(flat, numbins=nb, defaultreallimits=drl)
else:
res = _cumfreq(flat, numbins=nb)
counts = res.cumcount.tolist()
except Exception as e:
return f"scipy.stats.cumfreq error: {e}"
# Convert to 2D list
return [[c] for c in counts]