WLS_REGRESSION

Overview

The WLS_REGRESSION function fits a Weighted Least Squares (WLS) regression model, which is a generalization of ordinary least squares (OLS) designed to handle heteroscedasticity—situations where the variance of errors differs across observations. WLS is commonly used when data points have unequal reliability or precision, such as in survey data with varying sample sizes or measurements with different levels of uncertainty.

This implementation uses the statsmodels library’s WLS class. For source code and additional details, see the statsmodels GitHub repository.

In standard OLS, the objective is to minimize the sum of squared residuals. WLS extends this by assigning weights to each observation, minimizing a weighted sum of squared residuals instead:

S(\beta) = \sum_{i=1}^{n} w_i (y_i - X_i \beta)^2

where w_i are the weights for each observation. The weights are presumed to be proportional to the inverse of the variance of the observations, i.e., w_i = 1/\sigma_i^2. This means observations with lower variance (higher precision) receive higher weights and contribute more to the parameter estimates.

The WLS estimator is given by the solution to the weighted normal equations:

\hat{\beta} = (X^T W X)^{-1} X^T W y

where W is a diagonal matrix containing the weights. When all weights are equal, WLS reduces to OLS. According to the Gauss-Markov theorem, when weights are correctly specified as the inverse of the error variances, WLS produces the Best Linear Unbiased Estimator (BLUE). For more theoretical background, see Weighted least squares on Wikipedia.

The function returns comprehensive regression results including coefficient estimates, standard errors, t-statistics, p-values, confidence intervals, and model fit statistics such as R², adjusted R², F-statistic, AIC, and BIC.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=WLS_REGRESSION(y, x, weights, fit_intercept, alpha)

y (list[list], required): Column vector of dependent variable (response) values.
x (list[list], required): Matrix of independent variables (predictors). Each column is a predictor.
weights (list[list], required): Column vector of positive weights for each observation.
fit_intercept (bool, optional, default: true): Whether to add an intercept term to the model.
alpha (float, optional, default: 0.05): Significance level for confidence intervals (e.g., 0.05 for 95% CI).

Returns (list[list]): 2D list with WLS results

Example 1: Basic WLS regression with uniform weights

Inputs:

y	x	weights
1.1	1	1
1.9	2	1
3.2	3	1
3.8	4	1
5.1	5	1

Excel formula:

=WLS_REGRESSION({1.1;1.9;3.2;3.8;5.1}, {1;2;3;4;5}, {1;1;1;1;1})

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	0.05	0.198074	0.252431	0.817015	-0.58036	0.68036
x1	0.99	0.0597216	16.5769	0.000477858	0.799939	1.18006
r_squared	0.989201
adj_r_squared	0.985601
f_statistic	274.794
f_pvalue	0.000477858
aic	-1.03244
bic	-1.81356

Example 2: WLS regression without intercept

Inputs:

y	x	weights	fit_intercept
2.1	1	1	false
4.2	2	1
5.8	3	1
8.1	4	1

Excel formula:

=WLS_REGRESSION({2.1;4.2;5.8;8.1}, {1;2;3;4}, {1;1;1;1}, FALSE)

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
x1	2.01	0.0328295	61.2254	0.00000959974	1.90552	2.11448
r_squared	0.9992
adj_r_squared	0.998934
f_statistic	3748.55
f_pvalue	0.00000959974
aic	-1.52585
bic	-2.13955

Example 3: WLS with custom weights and alpha

Inputs:

y	x	weights	alpha
1.2	1	2	0.1
2.4	2	1.5
3.1	3	1
4.5	4	1.2
5.2	5	1.8

Excel formula:

=WLS_REGRESSION({1.2;2.4;3.1;4.5;5.2}, {1;2;3;4;5}, {2;1.5;1;1.2;1.8}, 0.1)

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	0.25637	0.165574	1.54837	0.219289	-0.133286	0.646025
x1	1.00629	0.0503204	19.9977	0.000273295	0.887873	1.12472
r_squared	0.992554
adj_r_squared	0.990072
f_statistic	399.909
f_pvalue	0.000273295
aic	-1.72118
bic	-2.50231

Example 4: WLS with multiple predictors and varying weights

Inputs:

y	x		weights	fit_intercept	alpha
5.5	1	3	1	true	0.05
8.2	2	2	1.5
11.1	3	4	1
12.5	4	1	2
16.3	5	5	1.2

Excel formula:

=WLS_REGRESSION({5.5;8.2;11.1;12.5;16.3}, {1,3;2,2;3,4;4,1;5,5}, {1;1.5;1;2;1.2}, TRUE, 0.05)

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	2.36978	0.347614	6.81727	0.0208464	0.874115	3.86544
x1	2.4755	0.0909512	27.2179	0.00134714	2.08417	2.86683
x2	0.310947	0.0829505	3.74858	0.0643697	-0.0459603	0.667854
r_squared	0.997582
adj_r_squared	0.995164
f_statistic	412.586
f_pvalue	0.00241788
aic	2.66076
bic	1.48907

Python Code

Show Code

import math
from statsmodels.regression.linear_model import WLS as statsmodels_WLS

def wls_regression(y, x, weights, fit_intercept=True, alpha=0.05):
    """
    Fits a Weighted Least Squares (WLS) regression model.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.WLS.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of dependent variable (response) values.
        x (list[list]): Matrix of independent variables (predictors). Each column is a predictor.
        weights (list[list]): Column vector of positive weights for each observation.
        fit_intercept (bool, optional): Whether to add an intercept term to the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (e.g., 0.05 for 95% CI). Default is 0.05.

    Returns:
        list[list]: 2D list with WLS results
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val
    try:
        # Normalize inputs to 2D lists
        y = to2d(y)
        x = to2d(x)
        weights = to2d(weights)

        # Validate y is a column vector
        if not isinstance(y, list) or not all(isinstance(row, list) for row in y):
            return "Error: y must be a 2D list."
        if len(y) == 0:
            return "Error: y must not be empty."
        if not all(len(row) == 1 for row in y):
            return "Error: y must be a column vector (each row must have exactly 1 element)."

        # Extract y values
        y_vals = [float(row[0]) for row in y]

        if any(math.isnan(val) or math.isinf(val) for val in y_vals):
            return "Error: y must contain finite values."

        n_obs = len(y_vals)

        # Validate x is a matrix
        if not isinstance(x, list) or not all(isinstance(row, list) for row in x):
            return "Error: x must be a 2D list."
        if len(x) != n_obs:
            return "Error: x must have the same number of rows as y."
        if len(x) == 0:
            return "Error: x must not be empty."

        n_predictors = len(x[0])
        if n_predictors == 0:
            return "Error: x must have at least one column."
        if not all(len(row) == n_predictors for row in x):
            return "Error: x must have consistent column count across all rows."

        # Extract x values
        x_vals = [[float(val) for val in row] for row in x]

        if any(math.isnan(val) or math.isinf(val) for row in x_vals for val in row):
            return "Error: x must contain finite values."

        # Validate weights is a column vector
        if not isinstance(weights, list) or not all(isinstance(row, list) for row in weights):
            return "Error: weights must be a 2D list."
        if len(weights) != n_obs:
            return "Error: weights must have the same number of rows as y."
        if not all(len(row) == 1 for row in weights):
            return "Error: weights must be a column vector (each row must have exactly 1 element)."

        # Extract weight values
        weight_vals = [float(row[0]) for row in weights]

        if any(math.isnan(val) or math.isinf(val) for val in weight_vals):
            return "Error: weights must contain finite values."
        if any(val <= 0 for val in weight_vals):
            return "Error: weights must be positive."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: fit_intercept must be a boolean."

        # Validate alpha
        alpha_val = float(alpha)
        if math.isnan(alpha_val) or math.isinf(alpha_val):
            return "Error: alpha must be finite."
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: alpha must be between 0 and 1."

        # Add intercept column if needed
        if fit_intercept:
            x_vals = [[1.0] + row for row in x_vals]

        # Fit WLS model
        model = statsmodels_WLS(y_vals, x_vals, weights=weight_vals)
        results = model.fit()

        # Extract confidence intervals
        conf_int = results.conf_int(alpha=alpha_val)

        # Build output table
        output = [['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']]

        # Add parameter results
        param_names = []
        if fit_intercept:
            param_names.append('intercept')
        for i in range(n_predictors):
            param_names.append(f'x{i+1}')

        for i, param_name in enumerate(param_names):
            coef = float(results.params[i])
            std_err = float(results.bse[i])
            t_stat = float(results.tvalues[i])
            p_val = float(results.pvalues[i])
            ci_lower = float(conf_int[i, 0])
            ci_upper = float(conf_int[i, 1])

            if any(math.isnan(val) or math.isinf(val) for val in [coef, std_err, t_stat, p_val, ci_lower, ci_upper]):
                return f"Error: non-finite value in results for parameter {param_name}."

            output.append([param_name, coef, std_err, t_stat, p_val, ci_lower, ci_upper])

        # Add model statistics
        r_squared = float(results.rsquared)
        adj_r_squared = float(results.rsquared_adj)
        f_stat = float(results.fvalue)
        f_pval = float(results.f_pvalue)
        aic = float(results.aic)
        bic = float(results.bic)

        if any(math.isnan(val) or math.isinf(val) for val in [r_squared, adj_r_squared, f_stat, f_pval, aic, bic]):
            return "Error: non-finite value in model statistics."

        output.append(['r_squared', r_squared, '', '', '', '', ''])
        output.append(['adj_r_squared', adj_r_squared, '', '', '', '', ''])
        output.append(['f_statistic', f_stat, '', '', '', '', ''])
        output.append(['f_pvalue', f_pval, '', '', '', '', ''])
        output.append(['aic', aic, '', '', '', '', ''])
        output.append(['bic', bic, '', '', '', '', ''])

        return output
    except Exception as exc:
        return f"Error: {exc}"

Online Calculator

y *

Column vector of dependent variable (response) values.

x *

Matrix of independent variables (predictors). Each column is a predictor.

weights *

Column vector of positive weights for each observation.

fit_intercept

Whether to add an intercept term to the model.

alpha

Significance level for confidence intervals (e.g., 0.05 for 95% CI).