DESCRIBE

Overview

The DESCRIBE function computes several descriptive statistics for a dataset in a single operation, providing a comprehensive summary of the data’s distribution. It returns seven key metrics: the number of observations, minimum value, maximum value, arithmetic mean, variance, skewness, and kurtosis.

This implementation uses the scipy.stats.describe function from the SciPy library, a fundamental tool for scientific computing in Python. The underlying source code is available in the SciPy GitHub repository.

Skewness measures the asymmetry of a distribution. A value of zero indicates a symmetric distribution, positive values indicate a longer right tail, and negative values indicate a longer left tail. Kurtosis (Fisher’s definition) measures the “tailedness” of a distribution relative to a normal distribution. The function normalizes kurtosis so that a normal distribution has a kurtosis of zero; positive values indicate heavier tails (leptokurtic), while negative values indicate lighter tails (platykurtic).

The ddof parameter (delta degrees of freedom) adjusts the divisor used in variance calculations. With ddof=0, the function computes the population variance (dividing by n); with ddof=1, it computes the sample variance (dividing by n-1), which provides an unbiased estimate when working with samples. The bias parameter controls whether skewness and kurtosis calculations apply a correction for statistical bias—setting bias=False applies the correction.

The function flattens the input data and filters out non-numeric or non-finite values before computing statistics, requiring at least two valid numeric values to produce results.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=DESCRIBE(data, ddof, bias)
  • data (list[list], required): Table of numeric values to analyze.
  • ddof (int, optional, default: 0): Delta degrees of freedom for variance calculation.
  • bias (bool, optional, default: false): If true, calculations are corrected for statistical bias.

Returns (list[list]): 2D list [[nobs, min, max, mean, var, skew, kurt]], or error string.

Examples

Example 1: Basic statistics with default parameters

Inputs:

data ddof bias
1 2 3 0 false
4 5 6

Excel formula:

=DESCRIBE({1,2,3;4,5,6}, 0, FALSE)

Expected output:

Result
6 1 6 3.5 2.9167 0 -1.2

Example 2: Statistics with ddof=1 for sample variance

Inputs:

data ddof bias
1 2 3 1 false
4 5 6

Excel formula:

=DESCRIBE({1,2,3;4,5,6}, 1, FALSE)

Expected output:

Result
6 1 6 3.5 3.5 0 -1.2

Example 3: Statistics with bias correction enabled

Inputs:

data ddof bias
1 2 3 0 true
4 5 6

Excel formula:

=DESCRIBE({1,2,3;4,5,6}, 0, TRUE)

Expected output:

Result
6 1 6 3.5 2.9167 0 -1.2686

Example 4: Statistics for larger values dataset

Inputs:

data ddof bias
10 20 30 0 false
40 50 60

Excel formula:

=DESCRIBE({10,20,30;40,50,60}, 0, FALSE)

Expected output:

Result
6 10 60 35 291.6667 0 -1.2

Python Code

import math
from scipy.stats import describe as scipy_describe

def describe(data, ddof=0, bias=False):
    """
    Compute descriptive statistics using scipy.stats.describe module.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.describe.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): Table of numeric values to analyze.
        ddof (int, optional): Delta degrees of freedom for variance calculation. Default is 0.
        bias (bool, optional): If true, calculations are corrected for statistical bias. Default is False.

    Returns:
        list[list]: 2D list [[nobs, min, max, mean, var, skew, kurt]], or error string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    data = to2d(data)

    if not isinstance(data, list) or not all(isinstance(row, list) for row in data):
        return "Invalid input: data must be a 2D list."

    flat = []
    for row in data:
        for x in row:
            try:
                val = float(x)
                if math.isfinite(val):
                    flat.append(val)
            except (TypeError, ValueError):
                continue

    if len(flat) < 2:
        return "Invalid input: data must contain at least two numeric values."

    if not isinstance(ddof, (int, float)) or int(ddof) != ddof or ddof < 0:
        return "Invalid input: ddof must be a non-negative integer."

    if not isinstance(bias, bool):
        return "Invalid input: bias must be a boolean."

    try:
        res = scipy_describe(flat, ddof=int(ddof), bias=bias)
    except Exception as e:
        return f"scipy.stats.describe error: {e}"

    out = [
        int(res.nobs),
        float(res.minmax[0]),
        float(res.minmax[1]),
        float(res.mean),
        float(res.variance),
        float(res.skewness),
        float(res.kurtosis),
    ]
    return [out]

Online Calculator