BINNED_STATISTIC

Overview

The BINNED_STATISTIC function computes summary statistics for data grouped into bins along a numeric axis. This is a generalization of a histogram: while a histogram counts the number of observations in each bin, BINNED_STATISTIC allows computation of mean, sum, median, standard deviation, minimum, maximum, or count for values falling within each bin.

This implementation wraps scipy.stats.binned_statistic from the SciPy library. The function takes two parallel arrays: an independent variable x that determines bin assignment, and a dependent variable values on which the statistic is computed. Data points are assigned to bins based on their x values, and the selected statistic is calculated from the corresponding values.

The binning process divides the range of x into equal-width intervals. By default, the range spans from the minimum to maximum values in x, but custom bounds can be specified using the lowerlimit and upperlimit parameters. Values of x outside the specified range are ignored. All bins except the rightmost are half-open intervals [a, b), meaning they include the left edge but exclude the right edge. The rightmost bin [a, b] includes both edges.

Supported statistics include:

  • mean: arithmetic mean of values in each bin (empty bins return NaN)
  • sum: sum of values, equivalent to a weighted histogram
  • median: median value in each bin
  • count: number of observations, equivalent to an unweighted histogram
  • std: standard deviation (calculated with \text{ddof}=0)
  • min / max: minimum or maximum value in each bin

For more information, see the SciPy binned_statistic documentation and the SciPy GitHub repository.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=BINNED_STATISTIC(x, values, bs_statistic, bins, lowerlimit, upperlimit)
  • x (list[list], required): Data to bin (independent variable).
  • values (list[list], required): Data to compute statistics on (dependent variable).
  • bs_statistic (str, optional, default: “mean”): Statistic to compute.
  • bins (int, optional, default: 10): Number of bins.
  • lowerlimit (float, optional, default: null): Lower bound for the bin range.
  • upperlimit (float, optional, default: null): Upper bound for the bin range.

Returns (list[list]): 2D list of bin statistics, or error message string.

Example 1: Mean statistic in 2 bins

Inputs:

x values bs_statistic bins
1 1 mean 2
1 1
2 2
5 1.5
7 3

Excel formula:

=BINNED_STATISTIC({1;1;2;5;7}, {1;1;2;1.5;3}, "mean", 2)

Expected output:

Result
1.33333
2.25
Example 2: Sum statistic in 2 bins

Inputs:

x values bs_statistic bins
1 1 sum 2
1 1
2 2
5 1.5
7 3

Excel formula:

=BINNED_STATISTIC({1;1;2;5;7}, {1;1;2;1.5;3}, "sum", 2)

Expected output:

Result
4
4.5
Example 3: Median with custom range

Inputs:

x values bs_statistic bins lowerlimit upperlimit
1 10 median 3 1 7
2 20
3 30
4 40
5 50
6 60
7 70

Excel formula:

=BINNED_STATISTIC({1;2;3;4;5;6;7}, {10;20;30;40;50;60;70}, "median", 3, 1, 7)

Expected output:

Result
15
35
60
Example 4: Count statistic in 2 bins

Inputs:

x values bs_statistic bins
1 10 count 2
2 20
3 30
4 40
5 50

Excel formula:

=BINNED_STATISTIC({1;2;3;4;5}, {10;20;30;40;50}, "count", 2)

Expected output:

Result
2
3

Python Code

Show Code
from scipy.stats import binned_statistic as scipy_binned_statistic
import math

def binned_statistic(x, values, bs_statistic='mean', bins=10, lowerlimit=None, upperlimit=None):
    """
    Computes a binned statistic (mean, sum, median, etc.) for the input data.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list]): Data to bin (independent variable).
        values (list[list]): Data to compute statistics on (dependent variable).
        bs_statistic (str, optional): Statistic to compute. Valid options: Mean, Sum, Median, Count, Std, Min, Max. Default is 'mean'.
        bins (int, optional): Number of bins. Default is 10.
        lowerlimit (float, optional): Lower bound for the bin range. Default is None.
        upperlimit (float, optional): Upper bound for the bin range. Default is None.

    Returns:
        list[list]: 2D list of bin statistics, or error message string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    try:
      # Convert scalars to 2D lists if needed
      x = to2d(x)
      values = to2d(values)
      # Flatten 2D lists to 1D
      try:
        flat_x = [float(item) for row in x for item in (row if isinstance(row, list) else [row])]
        flat_values = [float(item) for row in values for item in (row if isinstance(row, list) else [row])]
      except Exception:
        return "Error: x and values must be 2D lists or scalars of numbers."
      if len(flat_x) == 0 or len(flat_values) == 0:
        return "Error: x and values must not be empty."
      if len(flat_x) != len(flat_values):
        return "Error: x and values must have the same length."
      if bs_statistic not in ('mean', 'sum', 'median', 'count', 'std', 'min', 'max'):
        return "Error: bs_statistic must be one of 'mean', 'sum', 'median', 'count', 'std', 'min', 'max'."
      try:
        nbins = int(bins)
      except Exception:
        return "Error: bins must be an integer."
      if nbins < 1:
        return "Error: bins must be >= 1."
      # Prepare range
      range_arg = None
      if lowerlimit is not None and upperlimit is not None:
        try:
          range_arg = (float(lowerlimit), float(upperlimit))
        except Exception:
          return "Error: lowerlimit and upperlimit must be numbers."
      try:
        if range_arg:
          res = scipy_binned_statistic(flat_x, flat_values, statistic=bs_statistic, bins=nbins, range=range_arg)
        else:
          res = scipy_binned_statistic(flat_x, flat_values, statistic=bs_statistic, bins=nbins)
        stat = res.statistic
      except Exception as e:
        return f"Error: scipy.stats.binned_statistic error: {e}"
      # Return as 2D list (column vector), converting NaN to None for Excel compatibility
      result = []
      for val in stat:
        if val is None or (isinstance(val, float) and math.isnan(val)):
          result.append([""])
        else:
          result.append([float(val)])
      return result
    except Exception as e:
      return f"Error: {str(e)}"

Online Calculator

Data to bin (independent variable).
Data to compute statistics on (dependent variable).
Statistic to compute.
Number of bins.
Lower bound for the bin range.
Upper bound for the bin range.