QUANTILE_REGRESSION
Overview
The QUANTILE_REGRESSION function fits a quantile regression model to estimate conditional quantiles of a response variable given predictor variables. Unlike ordinary least squares (OLS) regression, which estimates the conditional mean, quantile regression estimates specific percentiles (such as the median or any other quantile) of the response distribution. This makes it particularly valuable for understanding relationships across the entire distribution of outcomes, not just the average.
Quantile regression was introduced by Roger Koenker in 1978 and has become a standard tool in econometrics, ecology, and medical research. The method is especially useful when the relationship between variables differs at different points of the outcome distribution, or when dealing with heteroscedastic data where variance changes across observations. For a comprehensive treatment, see Koenker’s foundational text Quantile Regression (Cambridge University Press, 2005).
This implementation uses the statsmodels library’s QuantReg class, which solves the quantile regression problem using iterative reweighted least squares. For details, see the statsmodels QuantReg documentation.
The quantile regression estimator minimizes an asymmetric loss function called the check function (or pinball loss):
\hat{\beta}_\tau = \arg\min_{\beta} \sum_{i=1}^{n} \rho_\tau(y_i - x_i'\beta)
where the loss function \rho_\tau(u) is defined as:
\rho_\tau(u) = u \cdot (\tau - \mathbf{1}_{u < 0}) = \begin{cases} \tau \cdot u & \text{if } u \geq 0 \\ (\tau - 1) \cdot u & \text{if } u < 0 \end{cases}
For \tau = 0.5, the check function reduces to the absolute value (up to a constant), making median regression equivalent to Least Absolute Deviations (LAD) regression. Key advantages of quantile regression include robustness to outliers in the response variable and the ability to characterize heterogeneous effects across the outcome distribution.
The function returns coefficients, standard errors, t-statistics, p-values, confidence intervals, and a pseudo R-squared measure of fit. The pseudo R-squared compares the minimized loss under the full model to an intercept-only model. For more theoretical background, see Quantile regression on Wikipedia.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=QUANTILE_REGRESSION(y, x, quantile, fit_intercept)
y(list[list], required): Dependent variable (response) as a column vector of numeric values.x(list[list], required): Independent variables (predictors) as a matrix. Each column is a predictor; rows must match y.quantile(float, optional, default: 0.5): Quantile level to estimate, between 0 and 1 (exclusive).fit_intercept(bool, optional, default: true): If true, adds an intercept term to the model.
Returns (list[list]): 2D list with quantile regression results, or error string.
Examples
Example 1: Demo case 1
Inputs:
| y | x |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
Excel formula:
=QUANTILE_REGRESSION({1;2;3;4;5}, {1;2;3;4;5})
Expected output:
"non-error"
Example 2: Demo case 2
Inputs:
| y | x | quantile |
|---|---|---|
| 2.3 | 1 | 0.25 |
| 4.1 | 2 | |
| 5.8 | 3 | |
| 7.2 | 4 | |
| 9.5 | 5 | |
| 11.1 | 6 |
Excel formula:
=QUANTILE_REGRESSION({2.3;4.1;5.8;7.2;9.5;11.1}, {1;2;3;4;5;6}, 0.25)
Expected output:
"non-error"
Example 3: Demo case 3
Inputs:
| y | x | fit_intercept |
|---|---|---|
| 1.5 | 1 | false |
| 3.2 | 2 | |
| 4.8 | 3 | |
| 6.1 | 4 | |
| 7.9 | 5 |
Excel formula:
=QUANTILE_REGRESSION({1.5;3.2;4.8;6.1;7.9}, {1;2;3;4;5}, FALSE)
Expected output:
"non-error"
Example 4: Demo case 4
Inputs:
| y | x | quantile | fit_intercept |
|---|---|---|---|
| 3.1 | 1 | 0.75 | true |
| 5.8 | 2 | ||
| 8.2 | 3 | ||
| 10.5 | 4 | ||
| 13.1 | 5 | ||
| 15.8 | 6 |
Excel formula:
=QUANTILE_REGRESSION({3.1;5.8;8.2;10.5;13.1;15.8}, {1;2;3;4;5;6}, 0.75, TRUE)
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0.8 | |||||
| x1 | 2.5 | |||||
| pseudo_r_squared | 0.9766 |
Python Code
import math
from statsmodels.regression.quantile_regression import QuantReg as sm_quantreg
def quantile_regression(y, x, quantile=0.5, fit_intercept=True):
"""
Fits a quantile regression model to estimate conditional quantiles of the response distribution.
See: https://www.statsmodels.org/stable/generated/statsmodels.regression.quantile_regression.QuantReg.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Dependent variable (response) as a column vector of numeric values.
x (list[list]): Independent variables (predictors) as a matrix. Each column is a predictor; rows must match y.
quantile (float, optional): Quantile level to estimate, between 0 and 1 (exclusive). Default is 0.5.
fit_intercept (bool, optional): If true, adds an intercept term to the model. Default is True.
Returns:
list[list]: 2D list with quantile regression results, or error string.
"""
def to2d(val):
return [[val]] if not isinstance(val, list) else val
def validate_2d_numeric(arr, name):
if not isinstance(arr, list):
return f"Invalid input: {name} must be a 2D list."
if not arr:
return f"Invalid input: {name} cannot be empty."
for i, row in enumerate(arr):
if not isinstance(row, list):
return f"Invalid input: {name} must be a 2D list."
if not row:
return f"Invalid input: {name} cannot contain empty rows."
for j, val in enumerate(row):
if not isinstance(val, (int, float)):
return f"Invalid input: {name}[{i}][{j}] must be numeric."
if math.isnan(val) or math.isinf(val):
return f"Invalid input: {name}[{i}][{j}] must be finite."
return None
# Normalize inputs to 2D lists
y = to2d(y)
x = to2d(x)
# Validate inputs
err = validate_2d_numeric(y, "y")
if err:
return err
err = validate_2d_numeric(x, "x")
if err:
return err
# Validate quantile
if not isinstance(quantile, (int, float)):
return "Invalid input: quantile must be numeric."
if math.isnan(quantile) or math.isinf(quantile):
return "Invalid input: quantile must be finite."
if quantile <= 0 or quantile >= 1:
return "Invalid input: quantile must be between 0 and 1 (exclusive)."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
# Extract y as 1D array
y_flat = [row[0] for row in y]
n_obs = len(y_flat)
# Check x dimensions
n_rows_x = len(x)
if n_rows_x != n_obs:
return f"Invalid input: x must have same number of rows as y ({n_obs})."
# Extract x as 2D array
n_cols_x = len(x[0])
for row in x:
if len(row) != n_cols_x:
return "Invalid input: x must have consistent column count."
# Add intercept column if requested
if fit_intercept:
x_data = [[1.0] + list(row) for row in x]
else:
x_data = [list(row) for row in x]
# Fit quantile regression model
try:
model = sm_quantreg(y_flat, x_data)
result = model.fit(q=quantile)
except Exception as exc:
return f"statsmodels error: {exc}"
# Extract results
try:
params = result.params
bse = result.bse
tvalues = result.tvalues
pvalues = result.pvalues
conf_int = result.conf_int()
prsquared = result.prsquared
except Exception as exc:
return f"Error extracting results: {exc}"
# Build output table
headers = ['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']
output = [headers]
# Add parameter rows
for i in range(len(params)):
if fit_intercept and i == 0:
param_name = 'intercept'
else:
predictor_idx = i if not fit_intercept else i - 1
param_name = f'x{predictor_idx + 1}'
# Convert values to float or None if NaN
def safe_float(val):
fval = float(val)
if math.isnan(fval):
return None
if math.isinf(fval):
return f"Error: infinite value in result for {param_name}."
return fval
coef = safe_float(params[i])
if isinstance(coef, str):
return coef
se = safe_float(bse[i])
if isinstance(se, str):
return se
tval = safe_float(tvalues[i])
if isinstance(tval, str):
return tval
pval = safe_float(pvalues[i])
if isinstance(pval, str):
return pval
ci_low = safe_float(conf_int[i, 0])
if isinstance(ci_low, str):
return ci_low
ci_high = safe_float(conf_int[i, 1])
if isinstance(ci_high, str):
return ci_high
row = [param_name, coef, se, tval, pval, ci_low, ci_high]
output.append(row)
# Add pseudo R-squared row
prsq_val = float(prsquared)
if math.isinf(prsq_val):
return "Error: infinite pseudo R-squared."
if math.isnan(prsq_val):
prsq_val = None
output.append(['pseudo_r_squared', prsq_val, None, None, None, None, None])
return output