WLS_REGRESSION

Overview

The WLS_REGRESSION function fits a Weighted Least Squares (WLS) regression model, which is a generalization of ordinary least squares (OLS) designed to handle heteroscedasticity—situations where the variance of errors differs across observations. WLS is commonly used when data points have unequal reliability or precision, such as in survey data with varying sample sizes or measurements with different levels of uncertainty.

This implementation uses the statsmodels library’s WLS class. For source code and additional details, see the statsmodels GitHub repository.

In standard OLS, the objective is to minimize the sum of squared residuals. WLS extends this by assigning weights to each observation, minimizing a weighted sum of squared residuals instead:

S(\beta) = \sum_{i=1}^{n} w_i (y_i - X_i \beta)^2

where w_i are the weights for each observation. The weights are presumed to be proportional to the inverse of the variance of the observations, i.e., w_i = 1/\sigma_i^2. This means observations with lower variance (higher precision) receive higher weights and contribute more to the parameter estimates.

The WLS estimator is given by the solution to the weighted normal equations:

\hat{\beta} = (X^T W X)^{-1} X^T W y

where W is a diagonal matrix containing the weights. When all weights are equal, WLS reduces to OLS. According to the Gauss-Markov theorem, when weights are correctly specified as the inverse of the error variances, WLS produces the Best Linear Unbiased Estimator (BLUE). For more theoretical background, see Weighted least squares on Wikipedia.

The function returns comprehensive regression results including coefficient estimates, standard errors, t-statistics, p-values, confidence intervals, and model fit statistics such as R², adjusted R², F-statistic, AIC, and BIC.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=WLS_REGRESSION(y, x, weights, fit_intercept, alpha)
  • y (list[list], required): Column vector of dependent variable (response) values.
  • x (list[list], required): Matrix of independent variables (predictors). Each column is a predictor.
  • weights (list[list], required): Column vector of positive weights for each observation.
  • fit_intercept (bool, optional, default: true): Whether to add an intercept term to the model.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (e.g., 0.05 for 95% CI).

Returns (list[list]): 2D list with WLS results, or error message string.

Examples

Example 1: Basic WLS regression with uniform weights

Inputs:

y x weights
1.1 1 1
1.9 2 1
3.2 3 1
3.8 4 1
5.1 5 1

Excel formula:

=WLS_REGRESSION({1.1;1.9;3.2;3.8;5.1}, {1;2;3;4;5}, {1;1;1;1;1})

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 0.05 0.1981 0.2524 0.817 -0.5804 0.6804
x1 0.99 0.05972 16.58 0.0004779 0.7999 1.18
r_squared 0.9892
adj_r_squared 0.9856
f_statistic 274.8
f_pvalue 0.0004779
aic -1.032
bic -1.814

Example 2: WLS regression without intercept

Inputs:

y x weights fit_intercept
2.1 1 1 false
4.2 2 1
5.8 3 1
8.1 4 1

Excel formula:

=WLS_REGRESSION({2.1;4.2;5.8;8.1}, {1;2;3;4}, {1;1;1;1}, FALSE)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
x1 2.01 0.03283 61.23 0.0000096 1.906 2.114
r_squared 0.9992
adj_r_squared 0.9989
f_statistic 3749
f_pvalue 0.0000096
aic -1.526
bic -2.14

Example 3: WLS with custom weights and alpha

Inputs:

y x weights alpha
1.2 1 2 0.1
2.4 2 1.5
3.1 3 1
4.5 4 1.2
5.2 5 1.8

Excel formula:

=WLS_REGRESSION({1.2;2.4;3.1;4.5;5.2}, {1;2;3;4;5}, {2;1.5;1;1.2;1.8}, 0.1)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 0.2564 0.1656 1.548 0.2193 -0.1333 0.646
x1 1.006 0.05032 20 0.0002733 0.8879 1.125
r_squared 0.9926
adj_r_squared 0.9901
f_statistic 399.9
f_pvalue 0.0002733
aic -1.721
bic -2.502

Example 4: WLS with multiple predictors and varying weights

Inputs:

y x weights fit_intercept alpha
5.5 1 3 1 true 0.05
8.2 2 2 1.5
11.1 3 4 1
12.5 4 1 2
16.3 5 5 1.2

Excel formula:

=WLS_REGRESSION({5.5;8.2;11.1;12.5;16.3}, {1,3;2,2;3,4;4,1;5,5}, {1;1.5;1;2;1.2}, TRUE, 0.05)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 2.37 0.3476 6.817 0.02085 0.8741 3.865
x1 2.475 0.09095 27.22 0.001347 2.084 2.867
x2 0.3109 0.08295 3.749 0.06437 -0.04596 0.6679
r_squared 0.9976
adj_r_squared 0.9952
f_statistic 412.6
f_pvalue 0.002418
aic 2.661
bic 1.489

Python Code

import math
from statsmodels.regression.linear_model import WLS as statsmodels_WLS

def wls_regression(y, x, weights, fit_intercept=True, alpha=0.05):
    """
    Fits a Weighted Least Squares (WLS) regression model.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.WLS.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of dependent variable (response) values.
        x (list[list]): Matrix of independent variables (predictors). Each column is a predictor.
        weights (list[list]): Column vector of positive weights for each observation.
        fit_intercept (bool, optional): Whether to add an intercept term to the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (e.g., 0.05 for 95% CI). Default is 0.05.

    Returns:
        list[list]: 2D list with WLS results, or error message string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    # Normalize inputs to 2D lists
    y = to2d(y)
    x = to2d(x)
    weights = to2d(weights)

    # Validate y is a column vector
    if not isinstance(y, list) or not all(isinstance(row, list) for row in y):
        return "Invalid input: y must be a 2D list."
    if len(y) == 0:
        return "Invalid input: y must not be empty."
    if not all(len(row) == 1 for row in y):
        return "Invalid input: y must be a column vector (each row must have exactly 1 element)."

    # Extract y values
    try:
        y_vals = [float(row[0]) for row in y]
    except (ValueError, TypeError):
        return "Invalid input: y must contain numeric values."

    if any(math.isnan(val) or math.isinf(val) for val in y_vals):
        return "Invalid input: y must contain finite values."

    n_obs = len(y_vals)

    # Validate x is a matrix
    if not isinstance(x, list) or not all(isinstance(row, list) for row in x):
        return "Invalid input: x must be a 2D list."
    if len(x) != n_obs:
        return "Invalid input: x must have the same number of rows as y."
    if len(x) == 0:
        return "Invalid input: x must not be empty."

    n_predictors = len(x[0])
    if n_predictors == 0:
        return "Invalid input: x must have at least one column."
    if not all(len(row) == n_predictors for row in x):
        return "Invalid input: x must have consistent column count across all rows."

    # Extract x values
    try:
        x_vals = [[float(val) for val in row] for row in x]
    except (ValueError, TypeError):
        return "Invalid input: x must contain numeric values."

    if any(math.isnan(val) or math.isinf(val) for row in x_vals for val in row):
        return "Invalid input: x must contain finite values."

    # Validate weights is a column vector
    if not isinstance(weights, list) or not all(isinstance(row, list) for row in weights):
        return "Invalid input: weights must be a 2D list."
    if len(weights) != n_obs:
        return "Invalid input: weights must have the same number of rows as y."
    if not all(len(row) == 1 for row in weights):
        return "Invalid input: weights must be a column vector (each row must have exactly 1 element)."

    # Extract weight values
    try:
        weight_vals = [float(row[0]) for row in weights]
    except (ValueError, TypeError):
        return "Invalid input: weights must contain numeric values."

    if any(math.isnan(val) or math.isinf(val) for val in weight_vals):
        return "Invalid input: weights must contain finite values."
    if any(val <= 0 for val in weight_vals):
        return "Invalid input: weights must be positive."

    # Validate fit_intercept
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    # Validate alpha
    try:
        alpha_val = float(alpha)
    except (ValueError, TypeError):
        return "Invalid input: alpha must be numeric."
    if math.isnan(alpha_val) or math.isinf(alpha_val):
        return "Invalid input: alpha must be finite."
    if alpha_val <= 0 or alpha_val >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Add intercept column if needed
    if fit_intercept:
        x_vals = [[1.0] + row for row in x_vals]

    # Fit WLS model
    try:
        model = statsmodels_WLS(y_vals, x_vals, weights=weight_vals)
        results = model.fit()
    except Exception as exc:
        return f"statsmodels.regression.linear_model.WLS error: {exc}"

    # Extract confidence intervals
    try:
        conf_int = results.conf_int(alpha=alpha_val)
    except Exception as exc:
        return f"Error computing confidence intervals: {exc}"

    # Build output table
    output = [['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']]

    # Add parameter results
    param_names = []
    if fit_intercept:
        param_names.append('intercept')
    for i in range(n_predictors):
        param_names.append(f'x{i+1}')

    for i, param_name in enumerate(param_names):
        try:
            coef = float(results.params[i])
            std_err = float(results.bse[i])
            t_stat = float(results.tvalues[i])
            p_val = float(results.pvalues[i])
            ci_lower = float(conf_int[i, 0])
            ci_upper = float(conf_int[i, 1])
        except Exception as exc:
            return f"Error extracting parameter {param_name}: {exc}"

        if any(math.isnan(val) or math.isinf(val) for val in [coef, std_err, t_stat, p_val, ci_lower, ci_upper]):
            return f"Error: non-finite value in results for parameter {param_name}."

        output.append([param_name, coef, std_err, t_stat, p_val, ci_lower, ci_upper])

    # Add model statistics
    try:
        r_squared = float(results.rsquared)
        adj_r_squared = float(results.rsquared_adj)
        f_stat = float(results.fvalue)
        f_pval = float(results.f_pvalue)
        aic = float(results.aic)
        bic = float(results.bic)
    except Exception as exc:
        return f"Error extracting model statistics: {exc}"

    if any(math.isnan(val) or math.isinf(val) for val in [r_squared, adj_r_squared, f_stat, f_pval, aic, bic]):
        return "Error: non-finite value in model statistics."

    output.append(['r_squared', r_squared, '', '', '', '', ''])
    output.append(['adj_r_squared', adj_r_squared, '', '', '', '', ''])
    output.append(['f_statistic', f_stat, '', '', '', '', ''])
    output.append(['f_pvalue', f_pval, '', '', '', '', ''])
    output.append(['aic', aic, '', '', '', '', ''])
    output.append(['bic', bic, '', '', '', '', ''])

    return output

Online Calculator