RELFREQ
Overview
The RELFREQ function computes a relative frequency histogram for a dataset, returning the proportion of observations that fall within each bin rather than raw counts. This is particularly useful for comparing distributions across datasets of different sizes or for visualizing probability-like representations of data.
A relative frequency histogram maps data into bins and calculates the frequency of each bin relative to the total number of observations. For each bin i, the relative frequency is computed as:
f_i = \frac{n_i}{N}
where n_i is the count of observations in bin i and N is the total number of observations. By definition, the sum of all relative frequencies equals 1:
\sum_{i=1}^{k} f_i = 1
This implementation wraps the scipy.stats.relfreq function from the SciPy library. The function divides the data range into a specified number of bins (default 10) and returns the relative frequency for each bin as a column vector. Users can optionally specify custom lower and upper limits to control the histogram range; if not provided, SciPy automatically determines limits slightly larger than the data range.
Relative frequency histograms are foundational in descriptive statistics and data visualization, enabling analysts to understand the shape and spread of distributions. They serve as empirical approximations of probability density functions and are commonly used in exploratory data analysis, quality control, and reporting. For more details on the underlying algorithm, see the SciPy GitHub repository.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=RELFREQ(data, numbins, lowerlimit, upperlimit)
data(list[list], required): Input data as a column or row vector of float values.numbins(int, optional, default: 10): Number of bins to use for the histogram.lowerlimit(float, optional, default: null): Lower bound for the histogram range.upperlimit(float, optional, default: null): Upper bound for the histogram range.
Returns (list[list]): 2D list of relative frequencies, or error message string.
Examples
Example 1: Simple list with 4 bins
Inputs:
| data | numbins |
|---|---|
| 2 | 4 |
| 4 | |
| 1 | |
| 2 | |
| 3 | |
| 2 |
Excel formula:
=RELFREQ({2;4;1;2;3;2}, 4)
Expected output:
| Result |
|---|
| 0.1667 |
| 0.5 |
| 0.1667 |
| 0.1667 |
Example 2: 5 bins with custom range
Inputs:
| data | numbins | lowerlimit | upperlimit |
|---|---|---|---|
| 1 | 5 | 1 | 5 |
| 2 | |||
| 3 | |||
| 4 | |||
| 5 |
Excel formula:
=RELFREQ({1;2;3;4;5}, 5, 1, 5)
Expected output:
| Result |
|---|
| 0.2 |
| 0.2 |
| 0.2 |
| 0.2 |
| 0.2 |
Example 3: 3 bins with default range
Inputs:
| data | numbins |
|---|---|
| 10 | 3 |
| 20 | |
| 30 | |
| 40 | |
| 50 |
Excel formula:
=RELFREQ({10;20;30;40;50}, 3)
Expected output:
| Result |
|---|
| 0.2 |
| 0.4 |
| 0.4 |
Example 4: 2 bins with custom range
Inputs:
| data | numbins | lowerlimit | upperlimit |
|---|---|---|---|
| 5 | 2 | 5 | 15 |
| 10 | |||
| 15 |
Excel formula:
=RELFREQ({5;10;15}, 2, 5, 15)
Expected output:
| Result |
|---|
| 0.3333 |
| 0.6667 |
Python Code
from scipy.stats import relfreq as scipy_relfreq
def relfreq(data, numbins=10, lowerlimit=None, upperlimit=None):
"""
Returns the relative frequency histogram for the input data.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.relfreq.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): Input data as a column or row vector of float values.
numbins (int, optional): Number of bins to use for the histogram. Default is 10.
lowerlimit (float, optional): Lower bound for the histogram range. Default is None.
upperlimit (float, optional): Upper bound for the histogram range. Default is None.
Returns:
list[list]: 2D list of relative frequencies, or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
data = to2d(data)
# Flatten 2D list to 1D
try:
flat_data = [float(item) for row in data for item in (row if isinstance(row, list) else [row])]
except Exception:
return "Invalid input: data must be a 2D list or scalar of numbers."
if len(flat_data) == 0:
return "Invalid input: data must not be empty."
try:
nbins = int(numbins)
except Exception:
return "Invalid input: numbins must be an integer."
if nbins < 1:
return "Invalid input: numbins must be >= 1."
# Prepare limits
limits = None
if lowerlimit is not None or upperlimit is not None:
if lowerlimit is None or upperlimit is None:
return "Invalid input: both lowerlimit and upperlimit must be provided."
try:
limits = (float(lowerlimit), float(upperlimit))
except Exception:
return "Invalid input: lowerlimit and upperlimit must be numbers."
try:
if limits:
res = scipy_relfreq(flat_data, numbins=nbins, defaultreallimits=limits)
else:
res = scipy_relfreq(flat_data, numbins=nbins)
freq = res.frequency
except Exception as e:
return f"scipy.stats.relfreq error: {e}"
# Return as 2D list (column vector)
return [[float(x)] for x in freq]