STAT_PARETO
Overview
The STAT_PARETO function fits a collection of statistical models inspired by the Pareto distribution and related logarithmic transformations to observed data. These models capture power-law and logarithmic decay behaviors commonly found in wealth distributions, natural phenomena, and systems exhibiting heavy-tailed characteristics. The function uses non-linear least squares optimization via SciPy’s curve_fit to find optimal parameter values.
The Pareto distribution, named after Italian economist Vilfredo Pareto, is characterized by its power-law probability tail. For the classic Pareto Type I distribution, the cumulative distribution function is:
F(x) = 1 - \left(\frac{x_m}{x}\right)^\alpha \quad \text{for } x \geq x_m
where x_m is the minimum value (scale parameter) and \alpha is the shape parameter (tail index). This distribution underlies the famous “80-20 rule” (Pareto principle), where approximately 20% of causes produce 80% of effects.
The function supports eight model variants:
- Reciprocal Log Power Law: y = \frac{1}{a + b \ln(x)} — models inverse logarithmic relationships
- Log Offset Linear and Log One Plus X Linear: y = a + b \ln(x) — classic logarithmic growth with offset
- Shifted Log Linear Decay: y = a - b \ln(x + c) — logarithmic decay with a horizontal shift
- Simple Natural Log: y = A \ln(x) — single-parameter natural logarithm scaling
- Pareto Power Law (Single Param): y = 1 - x^{-A} — simplified Pareto CDF form
- Pareto Power Law With Scale: y = 1 - \left(\frac{b}{x}\right)^a — full Pareto CDF with scale parameter b
- Pearson Type IV Asymmetric Peak: A six-parameter model for asymmetric peaked distributions, combining power-law decay with exponential arctangent modulation
The fitting algorithm employs the Levenberg-Marquardt method (for unbounded problems) or trust-region reflective methods when bounds are specified. Each model includes intelligent initial parameter guesses derived from data statistics, and the function returns both fitted parameter values and their standard errors estimated from the covariance matrix.
These models are applicable to economics (income/wealth distributions), reliability engineering, network traffic analysis, hydrology (extreme rainfall events), and any domain where power-law or heavy-tailed distributions arise. For implementation details, see the SciPy curve_fit documentation and the SciPy GitHub repository.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=STAT_PARETO(xdata, ydata, stat_pareto_model)
xdata(list[list], required): The xdata valueydata(list[list], required): The ydata valuestat_pareto_model(str, required): The stat_pareto_model value
Returns (list[list]): 2D list [param_names, fitted_values, std_errors], or error string.
Examples
Example 1: Demo case 1
Inputs:
| stat_pareto_model | xdata | ydata |
|---|---|---|
| reciprocal_log_power_law | 0.1 | 3.0367302369217555 |
| 1.3250000000000002 | 0.32076392890124844 | |
| 2.5500000000000003 | 0.3034487893300985 | |
| 3.7750000000000004 | 0.32488474825975155 | |
| 5 | 0.21237394505129054 |
Excel formula:
=STAT_PARETO("reciprocal_log_power_law", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {3.0367302369217555;0.32076392890124844;0.3034487893300985;0.32488474825975155;0.21237394505129054})
Expected output:
"non-error"
Example 2: Demo case 2
Inputs:
| stat_pareto_model | xdata | ydata |
|---|---|---|
| log_offset_linear | 0.1 | 0.36902764003516775 |
| 1.3250000000000002 | 3.0352556604224823 | |
| 2.5500000000000003 | 3.780807602771912 | |
| 3.7750000000000004 | 4.257479048422858 | |
| 5 | 4.4225894632319145 |
Excel formula:
=STAT_PARETO("log_offset_linear", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {0.36902764003516775;3.0352556604224823;3.780807602771912;4.257479048422858;4.4225894632319145})
Expected output:
"non-error"
Example 3: Demo case 3
Inputs:
| stat_pareto_model | xdata | ydata |
|---|---|---|
| log_one_plus_x_linear | 0.1 | 2.86570248495542 |
| 1.3250000000000002 | 3.6315561990598044 | |
| 2.5500000000000003 | 4.100671485316171 | |
| 3.7750000000000004 | 4.439478709816687 | |
| 5 | 4.623980897897044 |
Excel formula:
=STAT_PARETO("log_one_plus_x_linear", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {2.86570248495542;3.6315561990598044;4.100671485316171;4.439478709816687;4.623980897897044})
Expected output:
"non-error"
Example 4: Demo case 4
Inputs:
| stat_pareto_model | xdata | ydata |
|---|---|---|
| shifted_log_linear_decay | 0.1 | 2.118089042277242 |
| 1.3250000000000002 | 1.6000721166439364 | |
| 2.5500000000000003 | 1.2381755368660448 | |
| 3.7750000000000004 | 0.9360752867947714 | |
| 5 | 0.7548674938171431 |
Excel formula:
=STAT_PARETO("shifted_log_linear_decay", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {2.118089042277242;1.6000721166439364;1.2381755368660448;0.9360752867947714;0.7548674938171431})
Expected output:
"non-error"
Example 5: Demo case 5
Inputs:
| stat_pareto_model | xdata | ydata |
|---|---|---|
| simple_natural_log | 0.1 | -6.235879990384085 |
| 1.3250000000000002 | 0.7470981582493581 | |
| 2.5500000000000003 | 2.6997341977359604 | |
| 3.7750000000000004 | 3.948159412536055 | |
| 5 | 4.380591451321681 |
Excel formula:
=STAT_PARETO("simple_natural_log", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {-6.235879990384085;0.7470981582493581;2.6997341977359604;3.948159412536055;4.380591451321681})
Expected output:
"non-error"
Example 6: Demo case 6
Inputs:
| stat_pareto_model | xdata | ydata |
|---|---|---|
| pareto_power_law_single_param | 0.1 | -555.7562945839984 |
| 1.3250000000000002 | -1.0158554001575881 | |
| 2.5500000000000003 | 8.206369055095575 | |
| 3.7750000000000004 | 18.09896551667567 | |
| 5 | -1.6447723377170855 |
Excel formula:
=STAT_PARETO("pareto_power_law_single_param", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {-555.7562945839984;-1.0158554001575881;8.206369055095575;18.09896551667567;-1.6447723377170855})
Expected output:
"non-error"
Example 7: Demo case 7
Inputs:
| stat_pareto_model | xdata | ydata |
|---|---|---|
| pareto_power_law_with_scale | 0.1 | 0.009535913828598551 |
| 1.3250000000000002 | 0.4699009341746975 | |
| 2.5500000000000003 | 0.925280769824784 | |
| 3.7750000000000004 | 0.9996079336594544 | |
| 5 | 0.9818241912860092 |
Excel formula:
=STAT_PARETO("pareto_power_law_with_scale", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {0.009535913828598551;0.4699009341746975;0.925280769824784;0.9996079336594544;0.9818241912860092})
Expected output:
"non-error"
Example 8: Demo case 8
Inputs:
| stat_pareto_model | xdata | ydata |
|---|---|---|
| pearson_type_iv_asymmetric_peak | 0.01 | 3.92428409687526 |
| 2.0075 | 7.178926599327028 | |
| 4.005 | 0.2107768608191403 | |
| 6.0024999999999995 | 0.22034461095419103 | |
| 8 | 0.01 |
Excel formula:
=STAT_PARETO("pearson_type_iv_asymmetric_peak", {0.01;2.0075;4.005;6.0024999999999995;8}, {3.92428409687526;7.178926599327028;0.2107768608191403;0.22034461095419103;0.01})
Expected output:
"non-error"
Python Code
import numpy as np
from scipy.optimize import curve_fit as scipy_curve_fit
import math
def stat_pareto(xdata, ydata, stat_pareto_model):
"""
Fits stat_pareto models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
This example function is provided as-is without any representation of accuracy.
Args:
xdata (list[list]): The xdata value
ydata (list[list]): The ydata value
stat_pareto_model (str): The stat_pareto_model value Valid options: Reciprocal Log Power Law, Log Offset Linear, Log One Plus X Linear, Shifted Log Linear Decay, Simple Natural Log, Pareto Power Law Single Param, Pareto Power Law With Scale, Pearson Type Iv Asymmetric Peak.
Returns:
list[list]: 2D list [param_names, fitted_values, std_errors], or error string.
"""
def _validate_data(xdata, ydata):
"""Validate and convert both xdata and ydata to numpy arrays."""
for name, arg in [("xdata", xdata), ("ydata", ydata)]:
if not isinstance(arg, list) or len(arg) < 2:
raise ValueError(f"{name}: must be a 2D list with at least two rows")
vals = []
for i, row in enumerate(arg):
if not isinstance(row, list) or len(row) == 0:
raise ValueError(f"{name} row {i}: must be a non-empty list")
try:
vals.append(float(row[0]))
except Exception:
raise ValueError(f"{name} row {i}: non-numeric value")
if name == "xdata":
x_arr = np.asarray(vals, dtype=np.float64)
else:
y_arr = np.asarray(vals, dtype=np.float64)
if x_arr.shape[0] != y_arr.shape[0]:
raise ValueError("xdata and ydata must have the same number of rows")
return x_arr, y_arr
# Model definitions dictionary
models = {
'reciprocal_log_power_law': {
'params': ['a', 'b'],
'model': lambda x, a, b: 1.0 / (a + b * np.log(x)),
'guess': lambda xa, ya: (1.0, 1.0),
},
'log_offset_linear': {
'params': ['a', 'b'],
'model': lambda x, a, b: a + b * np.log(np.clip(x, 1e-9, None)),
'guess': lambda xa, ya: (float(np.mean(ya)), 1.0),
},
'log_one_plus_x_linear': {
'params': ['a', 'b'],
'model': lambda x, a, b: a + b * np.log1p(np.clip(x, 0.0, None)),
'guess': lambda xa, ya: (float(np.mean(ya)), 1.0),
},
'shifted_log_linear_decay': {
'params': ['a', 'b', 'c'],
'model': lambda x, a, b, c: a - b * np.log(np.clip(x + c, 1e-9, None)),
'guess': lambda xa, ya: (float(np.mean(ya)), 1.0, float(np.min(xa))),
},
'simple_natural_log': {
'params': ['A'],
'model': lambda x, A: A * np.log(np.clip(x, 1e-9, None)),
'guess': lambda xa, ya: (1.0,),
},
'pareto_power_law_single_param': {
'params': ['A'],
'model': lambda x, A: 1.0 - np.power(np.clip(x, 1e-9, None), -A),
'guess': lambda xa, ya: (1.5,),
},
'pareto_power_law_with_scale': {
'params': ['a', 'b'],
'model': lambda x, a, b: 1.0 - np.power(b / np.clip(x, b, None), a),
'guess': lambda xa, ya: (1.5, float(np.min(xa))),
'bounds': (0.0, np.inf),
},
'pearson_type_iv_asymmetric_peak': {
'params': ['y0', 'A', 'm', 'v', 'alpha', 'lam'],
'model': lambda x, y0, A, m, v, alpha, lam: y0 + A * np.power(1.0 + np.square((x - lam) / alpha), -m) * np.exp(-v * np.arctan((x - lam) / alpha)),
'guess': lambda xa, ya: (float(np.min(ya)), float(np.ptp(ya) if np.ptp(ya) else 1.0), 1.0, 0.0, float(max((np.max(xa) - np.min(xa)) / 6.0, 1e-3)), float(np.median(xa))),
'bounds': ([-np.inf, -np.inf, 0.5, -np.inf, 0.0, -np.inf], np.inf),
}
}
# Validate model parameter
if stat_pareto_model not in models:
return f"Invalid model: {str(stat_pareto_model)}. Valid models are: {', '.join(models.keys())}"
model_info = models[stat_pareto_model]
# Validate and convert input data
try:
x_arr, y_arr = _validate_data(xdata, ydata)
except ValueError as e:
return f"Invalid input: {e}"
# Perform curve fitting
try:
p0 = model_info['guess'](x_arr, y_arr)
bounds = model_info.get('bounds', (-np.inf, np.inf))
if bounds == (-np.inf, np.inf):
popt, pcov = scipy_curve_fit(model_info['model'], x_arr, y_arr, p0=p0, maxfev=10000)
else:
popt, pcov = scipy_curve_fit(model_info['model'], x_arr, y_arr, p0=p0, bounds=bounds, maxfev=10000)
fitted_vals = [float(v) for v in popt]
for v in fitted_vals:
if math.isnan(v) or math.isinf(v):
return "Fitting produced invalid numeric values (NaN or inf)."
except ValueError as e:
return f"Initial guess error: {e}"
except Exception as e:
return f"curve_fit error: {e}"
# Calculate standard errors
std_errors = None
try:
if pcov is not None and np.isfinite(pcov).all():
std_errors = [float(v) for v in np.sqrt(np.diag(pcov))]
except Exception:
pass
return [model_info['params'], fitted_vals, std_errors] if std_errors else [model_info['params'], fitted_vals]