Regression Models
Overview
Regression analysis estimates the relationship between a dependent variable (target) and one or more independent variables (predictors). It is the workhorse of statistical modeling, used for prediction, forecasting, and inferring causal relationships.
Linear Regression Variants
The standard linear regression model assumes errors are normally distributed and have constant variance (homoscedasticity). When these assumptions are violated, alternative estimation methods are required.
- OLS_REGRESSION: Ordinary Least Squares. The standard linear model (y = X\beta + \epsilon). Returns coefficients, t-stats, R-squared, and F-stats.
- WLS_REGRESSION: Weighted Least Squares. Used when data points have different qualities or variances (heteroscedasticity). Points with higher variance are given less weight.
- GLS_REGRESSION: Generalized Least Squares. Handles correlation between errors (e.g., autocorrelation in time series) and non-constant variance.
Robust and Quantile Regression
Standard OLS is sensitive to outliers—a single extreme point can skew the regression line significantly.
- ROBUST_LINEAR_MODEL: Uses techniques like Huber regression or M-estimation to downweight outliers. Essential for noisy real-world data.
- QUANTILE_REGRESSION: Instead of predicting the mean (average), this predicts a specific percentile (e.g., the median or 90th percentile). Useful for understanding risk or extremes (e.g., “What is the 95th percentile of expected loss?”).
Diagnostics
Good modeling requires checking assumptions.
- OLS_DIAGNOSTICS: Tests for normality of residuals (Omnibus, Jarque-Bera), autocorrelation (Durbin-Watson), and multicollinearity (Condition Number).
- INFLUENCE_DIAG: Identifies influential points (high leverage or high Cook’s distance) that disproportionately affect the model.
Native Excel Capabilities
- Basic Regression:
LINESTperforms multiple linear regression but is hard to use (array formula, cryptic output layout). The Data Analysis ToolPak offers a GUI for regression but lacks dynamic updates and robust options. - No Robust/Quantile: Excel has no native support for robust regression or quantile regression. Outliers effectively break Excel’s regression models unless manually removed.
- Limited Diagnostics: Extracting p-values, AIC/BIC, or heteroscedasticity tests requires complex manual formulas or external tools.
Tools
| Tool | Description |
|---|---|
| GLS_REGRESSION | Fits a Generalized Least Squares (GLS) regression model. |
| INFLUENCE_DIAG | Computes regression influence diagnostics for identifying influential observations. |
| OLS_DIAGNOSTICS | Performs diagnostic tests on OLS regression residuals. |
| OLS_REGRESSION | Fits an Ordinary Least Squares (OLS) regression model. |
| QUANTILE_REGRESSION | Fits a quantile regression model to estimate conditional quantiles of the response distribution. |
| REGRESS_DIAG | Performs comprehensive regression diagnostic tests. |
| ROBUST_LINEAR_MODEL | Fits a robust linear regression model using M-estimators. |
| SPECIFICATION_TESTS | Performs regression specification tests to detect model misspecification. |
| WLS_REGRESSION | Fits a Weighted Least Squares (WLS) regression model. |