GLS_REGRESSION
Overview
The GLS_REGRESSION function fits a Generalized Least Squares (GLS) regression model, an extension of ordinary least squares (OLS) designed to handle situations where the error terms exhibit non-constant variance (heteroscedasticity) or are correlated with each other (autocorrelation). GLS was first described by Alexander Aitken in 1935 and remains a fundamental technique in econometrics and time series analysis.
In standard OLS regression, the Gauss-Markov theorem guarantees that the estimator is the best linear unbiased estimator (BLUE) when errors are independent and identically distributed. However, when these assumptions are violated, OLS estimates remain unbiased but are no longer efficient, and standard errors become unreliable. GLS addresses this by incorporating a known error covariance matrix \Omega into the estimation process.
The GLS estimator minimizes the squared Mahalanobis distance of the residual vector:
\hat{\beta}_{GLS} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} y
This is equivalent to applying OLS to a transformed version of the data. Using Cholesky decomposition \Omega = CC^T, the model is whitened by multiplying both sides by C^{-1}, which standardizes the scale and decorrelates the errors. The resulting estimator is unbiased, consistent, efficient, and asymptotically normal with covariance:
\text{Cov}[\hat{\beta} | X] = (X^T \Omega^{-1} X)^{-1}
When \Omega is diagonal, GLS reduces to weighted least squares (WLS), which handles heteroscedasticity without correlation between observations. This implementation uses the statsmodels library’s GLS class. The function returns coefficient estimates, standard errors, t-statistics, p-values, confidence intervals, and model fit statistics including log-likelihood, AIC, and BIC.
For more details on the theoretical background, see the Wikipedia article on Generalized Least Squares and the statsmodels GLS documentation.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=GLS_REGRESSION(y, x, sigma, fit_intercept, alpha)
y(list[list], required): Column vector of dependent variable values (n x 1 array)x(list[list], required): Matrix of independent variables (n x p array where p is the number of predictors)sigma(list[list], optional, default: null): Error covariance matrix (n x n array). Defaults to identity matrix if not provided.fit_intercept(bool, optional, default: true): If true, adds an intercept term to the modelalpha(float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)
Returns (list[list]): 2D list with GLS results, or error message string.
Examples
Example 1: Demo case 1
Inputs:
| y | x |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
Excel formula:
=GLS_REGRESSION({1;2;3;4}, {1;2;3;4})
Expected output:
"non-error"
Example 2: Demo case 2
Inputs:
| y | x | fit_intercept |
|---|---|---|
| 2 | 1 | false |
| 4 | 2 | |
| 6 | 3 | |
| 8 | 4 |
Excel formula:
=GLS_REGRESSION({2;4;6;8}, {1;2;3;4}, FALSE)
Expected output:
"non-error"
Example 3: Demo case 3
Inputs:
| y | x | sigma | |||
|---|---|---|---|---|---|
| 1.5 | 1 | 1 | 0.5 | 0 | 0 |
| 2.5 | 2 | 0.5 | 1 | 0.5 | 0 |
| 3.5 | 3 | 0 | 0.5 | 1 | 0.5 |
| 4.5 | 4 | 0 | 0 | 0.5 | 1 |
Excel formula:
=GLS_REGRESSION({1.5;2.5;3.5;4.5}, {1;2;3;4}, {1,0.5,0,0;0.5,1,0.5,0;0,0.5,1,0.5;0,0,0.5,1})
Expected output:
"non-error"
Example 4: Demo case 4
Inputs:
| y | x | fit_intercept | alpha | |
|---|---|---|---|---|
| 3 | 1 | 0.5 | true | 0.1 |
| 5 | 2 | 1 | ||
| 7 | 3 | 1.5 | ||
| 9 | 4 | 2 |
Excel formula:
=GLS_REGRESSION({3;5;7;9}, {1,0.5;2,1;3,1.5;4,2}, TRUE, 0.1)
Expected output:
"non-error"
Python Code
import math
import numpy as np
from statsmodels.regression.linear_model import GLS as statsmodels_gls
def gls_regression(y, x, sigma=None, fit_intercept=True, alpha=0.05):
"""
Fits a Generalized Least Squares (GLS) regression model.
See: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.GLS.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Column vector of dependent variable values (n x 1 array)
x (list[list]): Matrix of independent variables (n x p array where p is the number of predictors)
sigma (list[list], optional): Error covariance matrix (n x n array). Defaults to identity matrix if not provided. Default is None.
fit_intercept (bool, optional): If true, adds an intercept term to the model Default is True.
alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.
Returns:
list[list]: 2D list with GLS results, or error message string.
"""
def to2d(val):
return [[val]] if not isinstance(val, list) else val
def validate_2d_array(arr, name):
if not isinstance(arr, list):
return f"Invalid input: {name} must be a 2D list."
if len(arr) == 0:
return f"Invalid input: {name} cannot be empty."
for row in arr:
if not isinstance(row, list):
return f"Invalid input: {name} must be a 2D list."
return None
def to_numpy(arr_2d, name):
try:
arr = np.array(arr_2d, dtype=float)
except Exception:
return f"Invalid input: {name} must contain numeric values."
if np.any(np.isnan(arr)) or np.any(np.isinf(arr)):
return f"Invalid input: {name} must contain finite values."
return arr
# Normalize inputs
y = to2d(y)
x = to2d(x)
if sigma is not None:
sigma = to2d(sigma)
# Validate inputs
err = validate_2d_array(y, "y")
if err:
return err
err = validate_2d_array(x, "x")
if err:
return err
# Convert to numpy arrays
y_arr = to_numpy(y, "y")
if isinstance(y_arr, str):
return y_arr
x_arr = to_numpy(x, "x")
if isinstance(x_arr, str):
return x_arr
# Validate dimensions
if y_arr.ndim != 2 or y_arr.shape[1] != 1:
return "Invalid input: y must be a column vector (n x 1)."
n = y_arr.shape[0]
if x_arr.ndim != 2:
return "Invalid input: x must be a 2D matrix."
if x_arr.shape[0] != n:
return "Invalid input: y and x must have the same number of rows."
# Handle sigma
if sigma is not None:
err = validate_2d_array(sigma, "sigma")
if err:
return err
sigma_arr = to_numpy(sigma, "sigma")
if isinstance(sigma_arr, str):
return sigma_arr
if sigma_arr.ndim != 2:
return "Invalid input: sigma must be a 2D matrix."
if sigma_arr.shape[0] != n or sigma_arr.shape[1] != n:
return f"Invalid input: sigma must be {n} x {n} to match the number of observations."
else:
sigma_arr = None
# Validate alpha
if not isinstance(alpha, (int, float)):
return "Invalid input: alpha must be a number."
if math.isnan(alpha) or math.isinf(alpha):
return "Invalid input: alpha must be finite."
if alpha <= 0 or alpha >= 1:
return "Invalid input: alpha must be between 0 and 1."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
# Add intercept if requested
if fit_intercept:
x_arr = np.hstack([np.ones((n, 1)), x_arr])
# Flatten y for statsmodels
y_vec = y_arr.flatten()
# Fit GLS model
try:
model = statsmodels_gls(y_vec, x_arr, sigma=sigma_arr)
results = model.fit()
except Exception as exc:
return f"statsmodels.regression.linear_model.GLS error: {exc}"
# Extract results
try:
params = results.params
std_err = results.bse
t_stats = results.tvalues
p_values = results.pvalues
conf_int = results.conf_int(alpha=alpha)
# Build output table
output = [['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']]
# Parameter names
if fit_intercept:
param_names = ['intercept'] + [f'x{i+1}' for i in range(x_arr.shape[1] - 1)]
else:
param_names = [f'x{i+1}' for i in range(x_arr.shape[1])]
# Add parameter rows
for i, name in enumerate(param_names):
output.append([
name,
float(params[i]),
float(std_err[i]),
float(t_stats[i]),
float(p_values[i]),
float(conf_int[i, 0]),
float(conf_int[i, 1])
])
# Add model statistics
output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
output.append(['aic', float(results.aic), '', '', '', '', ''])
output.append(['bic', float(results.bic), '', '', '', '', ''])
except Exception as exc:
return f"Error extracting results: {exc}"
return output