GLS_REGRESSION

Overview

The GLS_REGRESSION function fits a Generalized Least Squares (GLS) regression model, an extension of ordinary least squares (OLS) designed to handle situations where the error terms exhibit non-constant variance (heteroscedasticity) or are correlated with each other (autocorrelation). GLS was first described by Alexander Aitken in 1935 and remains a fundamental technique in econometrics and time series analysis.

In standard OLS regression, the Gauss-Markov theorem guarantees that the estimator is the best linear unbiased estimator (BLUE) when errors are independent and identically distributed. However, when these assumptions are violated, OLS estimates remain unbiased but are no longer efficient, and standard errors become unreliable. GLS addresses this by incorporating a known error covariance matrix \Omega into the estimation process.

The GLS estimator minimizes the squared Mahalanobis distance of the residual vector:

\hat{\beta}_{GLS} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} y

This is equivalent to applying OLS to a transformed version of the data. Using Cholesky decomposition \Omega = CC^T, the model is whitened by multiplying both sides by C^{-1}, which standardizes the scale and decorrelates the errors. The resulting estimator is unbiased, consistent, efficient, and asymptotically normal with covariance:

\text{Cov}[\hat{\beta} | X] = (X^T \Omega^{-1} X)^{-1}

When \Omega is diagonal, GLS reduces to weighted least squares (WLS), which handles heteroscedasticity without correlation between observations. This implementation uses the statsmodels library’s GLS class. The function returns coefficient estimates, standard errors, t-statistics, p-values, confidence intervals, and model fit statistics including log-likelihood, AIC, and BIC.

For more details on the theoretical background, see the Wikipedia article on Generalized Least Squares and the statsmodels GLS documentation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLS_REGRESSION(y, x, sigma, fit_intercept, alpha)

y (list[list], required): Column vector of dependent variable values (n x 1 array)
x (list[list], required): Matrix of independent variables (n x p array where p is the number of predictors)
sigma (list[list], optional, default: null): Error covariance matrix (n x n array). Leave empty for None (no scaling; equivalent to OLS).
fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model
alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)

Returns (list[list]): 2D list with GLS results

Example 1: GLS equals OLS when sigma is omitted

Inputs:

y	x
1	1
2	2
3	3
4	4

Excel formula:

=GLS_REGRESSION({1;2;3;4}, {1;2;3;4})

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	2.22045e-16	1.10466e-15	0.201008	0.85928	-4.53091e-15	4.975e-15
x1	1	4.03364e-16	2479150000000000	1.62703e-31	1	1
log_likelihood	134.278
aic	-264.557
bic	-265.784

Example 2: GLS regression without intercept

Inputs:

y	x	fit_intercept
2	1	false
4	2
6	3
8	4

Excel formula:

=GLS_REGRESSION({2;4;6;8}, {1;2;3;4}, FALSE)

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
x1	2	2.84741e-16	7023930000000000	6.364e-48	2	2
log_likelihood	131.277
aic	-260.554
bic	-261.168

Example 3: GLS regression with correlated-error covariance

Inputs:

y	x	sigma
1.5	1	1	0.5	0	0
2.5	2	0.5	1	0.5	0
3.5	3	0	0.5	1	0.5
4.5	4	0	0	0.5	1

Excel formula:

=GLS_REGRESSION({1.5;2.5;3.5;4.5}, {1;2;3;4}, {1,0.5,0,0;0.5,1,0.5,0;0,0.5,1,0.5;0,0,0.5,1})

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	0.5	2.83777e-15	176195000000000	3.22118e-29	0.5	0.5
x1	1	9.83033e-16	1017260000000000	9.66355e-31	1	1
log_likelihood	131.297
aic	-258.594
bic	-259.821

Example 4: GLS with two predictors and 90% confidence intervals

Inputs:

y	x		fit_intercept	alpha
3	1	0.5	true	0.1
5	2	1
7	3	1.5
9	4	2

Excel formula:

=GLS_REGRESSION({3;5;7;9}, {1,0.5;2,1;3,1.5;4,2}, TRUE, 0.1)

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	1	4.14219e-15	241418000000000	1.71577e-29	1	1
x1	1.6	1.21001e-15	1322300000000000	5.71924e-31	1.6	1.6
x2	0.8	6.05005e-16	1322300000000000	5.71924e-31	0.8	0.8
log_likelihood	128.992
aic	-253.983
bic	-255.211

Python Code

Show Code

import math
import numpy as np
from statsmodels.regression.linear_model import GLS as statsmodels_gls

def gls_regression(y, x, sigma=None, fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Least Squares (GLS) regression model.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.GLS.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of dependent variable values (n x 1 array)
        x (list[list]): Matrix of independent variables (n x p array where p is the number of predictors)
        sigma (list[list], optional): Error covariance matrix (n x n array). Leave empty for None (no scaling; equivalent to OLS). Default is None.
        fit_intercept (bool, optional): If true, adds an intercept term to the model Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.

    Returns:
        list[list]: 2D list with GLS results
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_2d_array(arr, name):
        if not isinstance(arr, list):
            return f"Invalid input: {name} must be a 2D list."
        if len(arr) == 0:
            return f"Invalid input: {name} cannot be empty."
        for row in arr:
            if not isinstance(row, list):
                return f"Invalid input: {name} must be a 2D list."
        return None

    def to_numpy(arr_2d, name):
        try:
            arr = np.array(arr_2d, dtype=float)
        except Exception:
            return f"Error: {name} must contain numeric values."
        if np.any(np.isnan(arr)) or np.any(np.isinf(arr)):
            return f"Error: {name} must contain finite values."
        return arr

    try:
        # Normalize inputs
        y = to2d(y)
        x = to2d(x)
        if sigma is not None:
            sigma = to2d(sigma)

        # Validate inputs
        err = validate_2d_array(y, "y")
        if err:
            return f"Error: {err}"
        err = validate_2d_array(x, "x")
        if err:
            return f"Error: {err}"

        # Convert to numpy arrays
        y_arr = to_numpy(y, "y")
        if isinstance(y_arr, str):
            return y_arr
        x_arr = to_numpy(x, "x")
        if isinstance(x_arr, str):
            return x_arr

        # Validate dimensions
        if y_arr.ndim != 2 or y_arr.shape[1] != 1:
            return "Error: y must be a column vector (n x 1)."
        n = y_arr.shape[0]
        if x_arr.ndim != 2:
            return "Error: x must be a 2D matrix."
        if x_arr.shape[0] != n:
            return "Error: y and x must have the same number of rows."

        # Handle sigma
        if sigma is not None:
            err = validate_2d_array(sigma, "sigma")
            if err:
                return f"Error: {err}"
            sigma_arr = to_numpy(sigma, "sigma")
            if isinstance(sigma_arr, str):
                return sigma_arr
            if sigma_arr.ndim != 2:
                return "Error: sigma must be a 2D matrix."
            if sigma_arr.shape[0] != n or sigma_arr.shape[1] != n:
                return f"Error: sigma must be {n} x {n} to match the number of observations."
        else:
            sigma_arr = None

        # Validate alpha
        if not isinstance(alpha, (int, float)):
            return "Error: alpha must be a number."
        if math.isnan(alpha) or math.isinf(alpha):
            return "Error: alpha must be finite."
        if alpha <= 0 or alpha >= 1:
            return "Error: alpha must be between 0 and 1."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: fit_intercept must be a boolean."

        # Add intercept if requested
        if fit_intercept:
            x_arr = np.hstack([np.ones((n, 1)), x_arr])

        # Flatten y for statsmodels
        y_vec = y_arr.flatten()

        # Fit GLS model
        model = statsmodels_gls(y_vec, x_arr, sigma=sigma_arr)
        results = model.fit()

        # Extract results
        params = results.params
        std_err = results.bse
        t_stats = results.tvalues
        p_values = results.pvalues
        conf_int = results.conf_int(alpha=alpha)

        # Build output table
        output = [['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']]

        # Parameter names
        if fit_intercept:
            param_names = ['intercept'] + [f'x{i+1}' for i in range(x_arr.shape[1] - 1)]
        else:
            param_names = [f'x{i+1}' for i in range(x_arr.shape[1])]

        # Add parameter rows
        for i, name in enumerate(param_names):
            output.append([
                name,
                float(params[i]),
                float(std_err[i]),
                float(t_stats[i]),
                float(p_values[i]),
                float(conf_int[i, 0]),
                float(conf_int[i, 1])
            ])

        # Add model statistics
        output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
        output.append(['aic', float(results.aic), '', '', '', '', ''])
        output.append(['bic', float(results.bic), '', '', '', '', ''])

        return output
    except Exception as exc:
        return f"Error: {exc}"

Online Calculator

y *

Column vector of dependent variable values (n x 1 array)

x *

Matrix of independent variables (n x p array where p is the number of predictors)

sigma

Error covariance matrix (n x n array). Leave empty for None (no scaling; equivalent to OLS).

fit_intercept

If true, adds an intercept term to the model

alpha

Significance level for confidence intervals (between 0 and 1)