Regression Models

Overview

Regression analysis estimates the relationship between a dependent variable (target) and one or more independent variables (predictors). It is the workhorse of statistical modeling, used for prediction, forecasting, and inferring causal relationships.

Linear Regression Variants

The standard linear regression model assumes errors are normally distributed and have constant variance (homoscedasticity). When these assumptions are violated, alternative estimation methods are required.

Robust and Quantile Regression

Standard OLS is sensitive to outliers—a single extreme point can skew the regression line significantly.

  • ROBUST_LINEAR_MODEL: Uses techniques like Huber regression or M-estimation to downweight outliers. Essential for noisy real-world data.
  • QUANTILE_REGRESSION: Instead of predicting the mean (average), this predicts a specific percentile (e.g., the median or 90th percentile). Useful for understanding risk or extremes (e.g., “What is the 95th percentile of expected loss?”).
Figure 1: OLS vs Robust vs Quantile Regression: In the presence of outliers (red dots), OLS (blue) is pulled away from the main trend. Robust regression (green) remains stable. Quantile regression (dashed) captures the spread of the data.

Diagnostics

Good modeling requires checking assumptions.

  • OLS_DIAGNOSTICS: Tests for normality of residuals (Omnibus, Jarque-Bera), autocorrelation (Durbin-Watson), and multicollinearity (Condition Number).
  • INFLUENCE_DIAG: Identifies influential points (high leverage or high Cook’s distance) that disproportionately affect the model.

Native Excel Capabilities

  • Basic Regression: LINEST performs multiple linear regression but is hard to use (array formula, cryptic output layout). The Data Analysis ToolPak offers a GUI for regression but lacks dynamic updates and robust options.
  • No Robust/Quantile: Excel has no native support for robust regression or quantile regression. Outliers effectively break Excel’s regression models unless manually removed.
  • Limited Diagnostics: Extracting p-values, AIC/BIC, or heteroscedasticity tests requires complex manual formulas or external tools.

Tools

Tool Description
GLS_REGRESSION Fits a Generalized Least Squares (GLS) regression model.
INFLUENCE_DIAG Computes regression influence diagnostics for identifying influential observations.
OLS_DIAGNOSTICS Performs diagnostic tests on OLS regression residuals.
OLS_REGRESSION Fits an Ordinary Least Squares (OLS) regression model.
QUANTILE_REGRESSION Fits a quantile regression model to estimate conditional quantiles of the response distribution.
REGRESS_DIAG Performs comprehensive regression diagnostic tests.
ROBUST_LINEAR_MODEL Fits a robust linear regression model using M-estimators.
SPECIFICATION_TESTS Performs regression specification tests to detect model misspecification.
WLS_REGRESSION Fits a Weighted Least Squares (WLS) regression model.