STAT_DISTRIB

Overview

The STAT_DISTRIB function fits statistical distribution models to observed data using non-linear least squares optimization. This function leverages scipy.optimize.curve_fit to estimate the optimal parameters that minimize the sum of squared residuals between the model predictions and the observed data.

Statistical distributions are mathematical functions that describe the probability of occurrence of different possible outcomes. Fitting these distributions to empirical data is fundamental in reliability engineering, survival analysis, extreme value theory, and quality control. The function supports ten commonly used distributions:

  • Gumbel Extreme Value PDF: Models the distribution of extreme values (maxima or minima), commonly used in hydrology and finance for modeling rare events.
  • Laplace Double Exponential PDF: A symmetric distribution with heavier tails than the normal distribution, useful for modeling data with sharp peaks.
  • Lognormal PDF/CDF: Models variables that are the product of many independent positive factors, widely used in finance and biology.
  • Gamma CDF: A flexible family of distributions for modeling waiting times and reliability data.
  • Poisson PMF: Models the count of discrete events occurring in a fixed interval, essential for queuing theory and epidemiology.
  • Rayleigh CDF: Describes the magnitude of a two-dimensional vector with normally distributed components, used in signal processing.
  • Weibull PDF/CDF: The standard distribution for failure analysis and reliability engineering, capable of modeling increasing, decreasing, or constant failure rates.

The optimization process minimizes the objective function:

\chi^2 = \sum_{i=1}^{n} \left( y_i - f(x_i; \boldsymbol{\theta}) \right)^2

where y_i represents the observed values, f(x_i; \boldsymbol{\theta}) is the model prediction at point x_i, and \boldsymbol{\theta} is the vector of parameters to be estimated. The function uses the Levenberg-Marquardt algorithm for unconstrained problems or Trust Region Reflective (TRF) method when parameter bounds are specified. For more details, see the SciPy optimization documentation and the SciPy GitHub repository.

The function returns fitted parameter values along with their estimated standard errors derived from the covariance matrix, enabling uncertainty quantification. Standard errors are computed as \sigma_i = \sqrt{\text{diag}(\mathbf{C})_i} where \mathbf{C} is the parameter covariance matrix.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=STAT_DISTRIB(xdata, ydata, stat_distrib_model)
  • xdata (list[list], required): The xdata value
  • ydata (list[list], required): The ydata value
  • stat_distrib_model (str, required): The stat_distrib_model value

Returns (list[list]): 2D list [param_names, fitted_values, std_errors], or error string.

Examples

Example 1: Demo case 1

Inputs:

stat_distrib_model xdata ydata
gumbel_extreme_value_pdf 0.01 0.23590106220639281
2.0075 2.7357869830040222
4.005 0.8174049411915324
6.0024999999999995 0.08834598785743636
8 0.039820406618528856

Excel formula:

=STAT_DISTRIB("gumbel_extreme_value_pdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.23590106220639281;2.7357869830040222;0.8174049411915324;0.08834598785743636;0.039820406618528856})

Expected output:

"non-error"

Example 2: Demo case 2

Inputs:

stat_distrib_model xdata ydata
laplace_double_exponential_pdf 0.01 0.037229680077038044
2.0075 0.23417471703877937
4.005 0.14697460991010408
6.0024999999999995 0.028238228287397343
8 0.01

Excel formula:

=STAT_DISTRIB("laplace_double_exponential_pdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.037229680077038044;0.23417471703877937;0.14697460991010408;0.028238228287397343;0.01})

Expected output:

"non-error"

Example 3: Demo case 3

Inputs:

stat_distrib_model xdata ydata
lognormal_probability_density 0.01 0.5823323144343956
2.0075 0.3903845656866753
4.005 0.28738567276996047
6.0024999999999995 0.23775149983219734
8 0.19017316235313045

Excel formula:

=STAT_DISTRIB("lognormal_probability_density", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.5823323144343956;0.3903845656866753;0.28738567276996047;0.23775149983219734;0.19017316235313045})

Expected output:

"non-error"

Example 4: Demo case 4

Inputs:

stat_distrib_model xdata ydata
gamma_cumulative_distribution 0.1 1
1.3250000000000002 1
2.5500000000000003 1
3.7750000000000004 1
5 1

Excel formula:

=STAT_DISTRIB("gamma_cumulative_distribution", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {1;1;1;1;1})

Expected output:

"non-error"

Example 5: Demo case 5

Inputs:

stat_distrib_model xdata ydata
gumbel_extreme_maxima_cdf 0.01 0.08048585275638268
2.0075 0.38657945462078186
4.005 0.9757390073660966
6.0024999999999995 1.029304878230222
8 0.9954946147896686

Excel formula:

=STAT_DISTRIB("gumbel_extreme_maxima_cdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.08048585275638268;0.38657945462078186;0.9757390073660966;1.029304878230222;0.9954946147896686})

Expected output:

"non-error"

Example 6: Demo case 6

Inputs:

stat_distrib_model xdata ydata
lognormal_cumulative_distribution 0.1 1
1.3250000000000002 1
2.5500000000000003 1
3.7750000000000004 1
5 1

Excel formula:

=STAT_DISTRIB("lognormal_cumulative_distribution", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {1;1;1;1;1})

Expected output:

"non-error"

Example 7: Demo case 7

Inputs:

stat_distrib_model xdata ydata
poisson_discrete_count_pmf 0.01 1
2.0075 1
4.005 1
6.0024999999999995 1
8 1

Excel formula:

=STAT_DISTRIB("poisson_discrete_count_pmf", {0.01;2.0075;4.005;6.0024999999999995;8}, {1;1;1;1;1})

Expected output:

"non-error"

Example 8: Demo case 8

Inputs:

stat_distrib_model xdata ydata
rayleigh_magnitude_cdf 0.01 0.01
2.0075 0.8365273106273831
4.005 1.011899243812429
6.0024999999999995 1.0296105430932765
8 0.9954476090427379

Excel formula:

=STAT_DISTRIB("rayleigh_magnitude_cdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.01;0.8365273106273831;1.011899243812429;1.0296105430932765;0.9954476090427379})

Expected output:

"non-error"

Example 9: Demo case 9

Inputs:

stat_distrib_model xdata ydata
weibull_failure_analysis_pdf 0.01 0.005770566882927686
2.0075 -0.0016062827938215303
4.005 0.007524508826168242
6.0024999999999995 0.04264036585516798
8 0.583887968137886

Excel formula:

=STAT_DISTRIB("weibull_failure_analysis_pdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.005770566882927686;-0.0016062827938215303;0.007524508826168242;0.04264036585516798;0.583887968137886})

Expected output:

"non-error"

Example 10: Demo case 10

Inputs:

stat_distrib_model xdata ydata
weibull_reliability_cdf 0.01 0.031269365717061556
2.0075 1.4029550205119903
4.005 2.1574344532424874
6.0024999999999995 2.5385337350489827
8 2.6109817879427717

Excel formula:

=STAT_DISTRIB("weibull_reliability_cdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.031269365717061556;1.4029550205119903;2.1574344532424874;2.5385337350489827;2.6109817879427717})

Expected output:

"non-error"

Python Code

import numpy as np
from scipy.optimize import curve_fit as scipy_curve_fit
from scipy import special as sc
import math

def stat_distrib(xdata, ydata, stat_distrib_model):
    """
    Fits stat_distrib models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        xdata (list[list]): The xdata value
        ydata (list[list]): The ydata value
        stat_distrib_model (str): The stat_distrib_model value Valid options: Gumbel Extreme Value Pdf, Laplace Double Exponential Pdf, Lognormal Probability Density, Gamma Cumulative Distribution, Gumbel Extreme Maxima Cdf, Lognormal Cumulative Distribution, Poisson Discrete Count Pmf, Rayleigh Magnitude Cdf, Weibull Failure Analysis Pdf, Weibull Reliability Cdf.

    Returns:
        list[list]: 2D list [param_names, fitted_values, std_errors], or error string.
    """
    def _validate_data(xdata, ydata):
        """Validate and convert both xdata and ydata to numpy arrays."""
        for name, arg in [("xdata", xdata), ("ydata", ydata)]:
            if not isinstance(arg, list) or len(arg) < 2:
                raise ValueError(f"{name}: must be a 2D list with at least two rows")
            vals = []
            for i, row in enumerate(arg):
                if not isinstance(row, list) or len(row) == 0:
                    raise ValueError(f"{name} row {i}: must be a non-empty list")
                try:
                    vals.append(float(row[0]))
                except Exception:
                    raise ValueError(f"{name} row {i}: non-numeric value")
            if name == "xdata":
                x_arr = np.asarray(vals, dtype=np.float64)
            else:
                y_arr = np.asarray(vals, dtype=np.float64)

        if x_arr.shape[0] != y_arr.shape[0]:
            raise ValueError("xdata and ydata must have the same number of rows")
        return x_arr, y_arr

    # Model definitions dictionary
    models = {
        'gumbel_extreme_value_pdf': {
            'params': ['y0', 'xc', 'w', 'A'],
            'model': lambda x, y0, xc, w, A: y0 + A * np.exp(-np.exp(-(x - xc) / w) - (x - xc) / w + 1.0),
            'guess': lambda xa, ya: (float(np.min(ya)), float(np.median(xa)), float(max((np.max(xa) - np.min(xa)) / 6.0, 1e-3)), float(np.ptp(ya) if np.ptp(ya) else 1.0)),
            'bounds': ([-np.inf, -np.inf, 0.0, -np.inf], np.inf),
        },
        'laplace_double_exponential_pdf': {
            'params': ['y0', 'a', 'b'],
            'model': lambda x, y0, a, b: y0 + 0.5 / b * np.exp(-np.abs(x - a) / b),
            'guess': lambda xa, ya: (float(np.min(ya)), float(np.median(xa)), float(max(np.std(xa), 1e-3))),
            'bounds': ([-np.inf, -np.inf, 0.0], np.inf),
        },
        'lognormal_probability_density': {
            'params': ['A', 'mu', 'sigma'],
            'model': lambda x, A, mu, sigma: A * np.exp(-np.power(np.log(x) - mu, 2) / (2.0 * sigma * sigma)) / x,
            'guess': lambda xa, ya: (float(max(ya)), float(np.mean(np.log(xa[xa > 0])) if np.any(xa > 0) else 1.0), 1.0),
            'bounds': ([-np.inf, -np.inf, 0.0], np.inf),
        },
        'gamma_cumulative_distribution': {
            'params': ['y0', 'A1', 'a', 'b'],
            'model': lambda x, y0, A1, a, b: y0 + A1 * sc.gammainc(a, np.clip(x, 0.0, None) / b),
            'guess': lambda xa, ya: (float(np.min(ya)), float(np.ptp(ya) if np.ptp(ya) else 1.0), 2.0, 1.0),
            'bounds': ([-np.inf, 0.0, 0.0, 0.0], np.inf),
        },
        'gumbel_extreme_maxima_cdf': {
            'params': ['a', 'b'],
            'model': lambda x, a, b: 1.0 - np.exp(-np.exp((x - a) / b)),
            'guess': lambda xa, ya: (float(np.median(xa)), 1.0),
            'bounds': ([-np.inf, 0.0], np.inf),
        },
        'lognormal_cumulative_distribution': {
            'params': ['y0', 'A1', 'xc', 'w'],
            'model': lambda x, y0, A1, xc, w: np.where(x > 0, y0 + A1 * 0.5 * (1.0 + sc.erf((np.log(x) - xc) / (w * np.sqrt(2.0)))), y0),
            'guess': lambda xa, ya: (float(np.min(ya)), float(np.ptp(ya) if np.ptp(ya) else 1.0), float(np.log(np.median(xa[xa > 0])) if np.any(xa > 0) else 0.0), 1.0),
            'bounds': ([-np.inf, 0.0, -np.inf, 0.0], np.inf),
        },
        'poisson_discrete_count_pmf': {
            'params': ['A', 'lam'],
            'model': lambda x, A, lam: A * (np.power(lam, x) * np.exp(-lam) / sc.gamma(x + 1.0)),
            'guess': lambda xa, ya: (float(max(ya)), float(np.mean(xa))),
            'bounds': ([-np.inf, 0.0], np.inf),
        },
        'rayleigh_magnitude_cdf': {
            'params': ['b'],
            'model': lambda x, b: 1.0 - np.exp(-np.square(x) / (2.0 * np.square(b))),
            'guess': lambda xa, ya: (float(np.std(xa)),),
            'bounds': (0.0, np.inf),
        },
        'weibull_failure_analysis_pdf': {
            'params': ['y0', 'a', 'r', 'u'],
            'model': lambda x, y0, a, r, u: y0 + (r / a) * np.power(np.clip(x - u, 0.0, None) / a, r - 1.0) * np.exp(-np.power(np.clip(x - u, 0.0, None) / a, r)),
            'guess': lambda xa, ya: (float(np.min(ya)), float(np.std(xa) if np.std(xa) else 1.0), 1.5, float(np.min(xa))),
            'bounds': ([-np.inf, 0.0, 0.0, -np.inf], np.inf),
        },
        'weibull_reliability_cdf': {
            'params': ['y0', 'A1', 'a', 'b'],
            'model': lambda x, y0, A1, a, b: np.where(x > 0, y0 + A1 * (1.0 - np.exp(-np.power(x / a, b))), y0),
            'guess': lambda xa, ya: (float(np.min(ya)), float(np.ptp(ya) if np.ptp(ya) else 1.0), 1.0, 1.5),
            'bounds': ([-np.inf, 0.0, 0.0, 0.0], np.inf),
        }
    }

    # Validate model parameter
    if stat_distrib_model not in models:
        return f"Invalid model: {str(stat_distrib_model)}. Valid models are: {', '.join(models.keys())}"

    model_info = models[stat_distrib_model]

    # Validate and convert input data
    try:
        x_arr, y_arr = _validate_data(xdata, ydata)
    except ValueError as e:
        return f"Invalid input: {e}"

    # Perform curve fitting
    try:
        p0 = model_info['guess'](x_arr, y_arr)
        bounds = model_info.get('bounds', (-np.inf, np.inf))
        if bounds == (-np.inf, np.inf):
            popt, pcov = scipy_curve_fit(model_info['model'], x_arr, y_arr, p0=p0, maxfev=10000)
        else:
            popt, pcov = scipy_curve_fit(model_info['model'], x_arr, y_arr, p0=p0, bounds=bounds, maxfev=10000)

        fitted_vals = [float(v) for v in popt]
        for v in fitted_vals:
            if math.isnan(v) or math.isinf(v):
                return "Fitting produced invalid numeric values (NaN or inf)."
    except ValueError as e:
        return f"Initial guess error: {e}"
    except Exception as e:
        return f"curve_fit error: {e}"

    # Calculate standard errors
    std_errors = None
    try:
        if pcov is not None and np.isfinite(pcov).all():
            std_errors = [float(v) for v in np.sqrt(np.diag(pcov))]
    except Exception:
        pass

    return [model_info['params'], fitted_vals, std_errors] if std_errors else [model_info['params'], fitted_vals]

Online Calculator