Generalized Linear Models
Overview
Generalized Linear Models (GLM) extend standard linear regression to handle response variables that do not follow a normal distribution. While OLS assumes Y \sim N(\mu, \sigma^2), GLMs allow the response variable to follow any distribution from the Exponential Family (e.g., Binomial, Poisson, Gamma).
A GLM consists of three components: 1. Random Component: The probability distribution of the response Y. 2. Systematic Component: The linear combination of predictors X\beta. 3. Link Function: A function g(\cdot) that connects the mean \mu = E[Y] to the linear predictor: g(\mu) = X\beta.
Common GLM Families
- Binomial / Logit (GLM_BINOMIAL): Used for binary outcomes (Success/Fail, Yes/No). The default link is the logit function (ln(\frac{p}{1-p})), creating Logistic Regression.
- Poisson (GLM_POISSON): Used for count data (e.g., number of customer arrivals, defects per unit). The default link is the log function (ln(\mu)).
- Gamma (GLM_GAMMA): Used for positive, skewed continuous data (e.g., insurance claim amounts, time to failure).
- Negative Binomial (GLM_NEG_BINOM): Used for over-dispersed count data where the variance exceeds the mean (violating Poisson assumptions).
Tweedie and Inverse Gaussian
- GLM_TWEEDIE: A flexible family that includes Normal, Poisson, and Gamma as special cases. Often used in insurance pricing (“pure premium”) to model zero-inflated continuous data.
- GLM_INV_GAUSS: Inverse Gaussian distribution, used for highly skewed non-negative data (e.g., reaction times).
Native Excel Capabilities
Excel is severely limited for GLMs:
- No Native Functions: Excel has no
LOGISTICorPOISSON.REGRESSIONfunction. All GLM fitting requires manual setup of the Likelihood function and using the Solver add-in to maximize it, which is slow, unstable, and complex for users. - Linear Bias: Users often force non-normal data into linear models (like predicting counts with
LINEST), leading to invalid predictions (e.g., negative counts) and incorrect p-values.
The functions here wrap the robust statsmodels GLM framework, bringing iterative re-weighted least squares (IRLS) optimization directly to the spreadsheet.
Tools
| Tool | Description |
|---|---|
| GLM_BINOMIAL | Fits a Generalized Linear Model with binomial family for binary or proportion data. |
| GLM_GAMMA | Fit a Generalized Linear Model with Gamma family for positive continuous data. |
| GLM_INV_GAUSS | Fits a Generalized Linear Model with Inverse Gaussian family for right-skewed positive data. |
| GLM_NEG_BINOM | Fits a Generalized Linear Model with Negative Binomial family for overdispersed count data. |
| GLM_POISSON | Fits a Generalized Linear Model with Poisson family for count data. |
| GLM_TWEEDIE | Fits a Generalized Linear Model with Tweedie family for flexible distribution modeling. |