SIEGELSLOPES
Overview
The SIEGELSLOPES function computes the Siegel repeated medians estimator for robust linear regression, fitting a line to a set of points (x, y) while providing strong resistance to outliers. This method achieves an asymptotic breakdown point of 50%, meaning the estimator remains reliable even when nearly half of the data points are outliers.
The algorithm, introduced by Andrew Siegel in his 1982 paper “Robust Regression Using Repeated Medians” published in Biometrika, works by computing nested medians. For each data point (x_j, y_j), the algorithm calculates the median of all slopes from that point to every other point in the dataset. The final slope estimate is then taken as the median of all these individual median slopes:
m_j = \text{median}\left\{\frac{y_i - y_j}{x_i - x_j} : i \neq j\right\}
\text{slope} = \text{median}\{m_j : j = 1, \ldots, n\}
Two methods are available for estimating the intercept. The hierarchical method (default) uses the estimated slope and computes the intercept as the median of y - \text{slope} \cdot x. The separate method estimates the intercept independently by computing, for each point, the median of all intercepts from lines through the remaining points, then taking the median of these values.
This implementation uses scipy.stats.siegelslopes from the SciPy library (GitHub repository). The Siegel estimator is particularly useful when working with datasets that may contain outliers or when the error distribution is heavy-tailed, making it preferable to ordinary least squares regression in such scenarios. A related but less robust alternative is the Theil-Sen estimator, which uses a single median of all pairwise slopes rather than the repeated medians approach.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=SIEGELSLOPES(y, x, siegelslopes_method)
y(list[list], required): Dependent variable values. Each row is an observation.x(list[list], optional, default: null): Independent variable values. Must have the same number of rows as y. If omitted, defaults to sequence 0, 1, 2, …, n-1.siegelslopes_method(str, optional, default: “hierarchical”): Method for intercept estimation.
Returns (list[list]): 2D list [[slope, intercept]], or error message string.
Examples
Example 1: Demo case 1
Inputs:
| y |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
Excel formula:
=SIEGELSLOPES({1;2;3;4})
Expected output:
| Result | |
|---|---|
| 1 | 1 |
Example 2: Demo case 2
Inputs:
| y | x | siegelslopes_method |
|---|---|---|
| 1 | 0 | hierarchical |
| 2 | 1 | |
| 3 | 2 | |
| 4 | 3 |
Excel formula:
=SIEGELSLOPES({1;2;3;4}, {0;1;2;3}, "hierarchical")
Expected output:
| Result | |
|---|---|
| 1 | 1 |
Example 3: Demo case 3
Inputs:
| y | x | siegelslopes_method |
|---|---|---|
| 1 | 0 | hierarchical |
| 2 | 1 | |
| 100 | 2 | |
| 4 | 3 |
Excel formula:
=SIEGELSLOPES({1;2;100;4}, {0;1;2;3}, "hierarchical")
Expected output:
| Result | |
|---|---|
| 1 | 1 |
Example 4: Demo case 4
Inputs:
| y | x | siegelslopes_method |
|---|---|---|
| 1 | 0 | separate |
| 2 | 1 | |
| 3 | 2 | |
| 4 | 3 |
Excel formula:
=SIEGELSLOPES({1;2;3;4}, {0;1;2;3}, "separate")
Expected output:
| Result | |
|---|---|
| 1 | 1 |
Python Code
import math
from scipy.stats import siegelslopes as scipy_siegelslopes
def siegelslopes(y, x=None, siegelslopes_method='hierarchical'):
"""
Compute the Siegel repeated medians estimator for robust linear regression using scipy.stats.siegelslopes.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.siegelslopes.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Dependent variable values. Each row is an observation.
x (list[list], optional): Independent variable values. Must have the same number of rows as y. If omitted, defaults to sequence 0, 1, 2, ..., n-1. Default is None.
siegelslopes_method (str, optional): Method for intercept estimation. Valid options: Hierarchical, Separate. Default is 'hierarchical'.
Returns:
list[list]: 2D list [[slope, intercept]], or error message string.
"""
# Validate siegelslopes_method early to fail fast
if siegelslopes_method not in ("hierarchical", "separate"):
return "Invalid input: siegelslopes_method must be 'hierarchical' or 'separate'."
# Validate y
if not isinstance(y, list):
return "Invalid input: y must be a 2D list."
if len(y) < 2:
return "Invalid input: y must have at least two observations."
# Convert y to flat list
try:
y_flat = []
for row in y:
if isinstance(row, list):
if len(row) == 0:
return "Invalid input: y contains empty rows."
y_flat.append(float(row[0]))
else:
y_flat.append(float(row))
except (TypeError, ValueError, IndexError):
return "Invalid input: y must contain numeric values."
n = len(y_flat)
# Validate and convert x
if x is not None:
if not isinstance(x, list):
return "Invalid input: x must be a 2D list."
if len(x) != n:
return "Invalid input: x must have the same number of rows as y."
try:
x_flat = []
for row in x:
if isinstance(row, list):
if len(row) == 0:
return "Invalid input: x contains empty rows."
x_flat.append(float(row[0]))
else:
x_flat.append(float(row))
except (TypeError, ValueError, IndexError):
return "Invalid input: x must contain numeric values."
else:
x_flat = list(range(n))
# Compute Siegel slopes
try:
res = scipy_siegelslopes(y_flat, x_flat, method=siegelslopes_method)
slope = float(res.slope)
intercept = float(res.intercept)
# Check for NaN or infinity
if math.isnan(slope) or math.isnan(intercept):
return "Computation resulted in NaN."
if math.isinf(slope) or math.isinf(intercept):
return "Computation resulted in infinite value."
return [[slope, intercept]]
except ValueError as e:
return f"Invalid input: {e}"
except Exception as e:
return f"Computation error: {e}"