Generalized Linear Models

Overview

Generalized Linear Models (GLM) extend standard linear regression to handle response variables that do not follow a normal distribution. While OLS assumes Y \sim N(\mu, \sigma^2), GLMs allow the response variable to follow any distribution from the Exponential Family (e.g., Binomial, Poisson, Gamma).

A GLM consists of three components: 1. Random Component: The probability distribution of the response Y. 2. Systematic Component: The linear combination of predictors X\beta. 3. Link Function: A function g(\cdot) that connects the mean \mu = E[Y] to the linear predictor: g(\mu) = X\beta.

Common GLM Families

  • Binomial / Logit (GLM_BINOMIAL): Used for binary outcomes (Success/Fail, Yes/No). The default link is the logit function (ln(\frac{p}{1-p})), creating Logistic Regression.
  • Poisson (GLM_POISSON): Used for count data (e.g., number of customer arrivals, defects per unit). The default link is the log function (ln(\mu)).
  • Gamma (GLM_GAMMA): Used for positive, skewed continuous data (e.g., insurance claim amounts, time to failure).
  • Negative Binomial (GLM_NEG_BINOM): Used for over-dispersed count data where the variance exceeds the mean (violating Poisson assumptions).
Figure 1: Linear vs Logistic Regression: OLS (blue line) can predict probabilities < 0 or > 1. Logistic Regression (red curve) maps predictor values to the [0, 1] probability range using the sigmoid function.

Tweedie and Inverse Gaussian

  • GLM_TWEEDIE: A flexible family that includes Normal, Poisson, and Gamma as special cases. Often used in insurance pricing (“pure premium”) to model zero-inflated continuous data.
  • GLM_INV_GAUSS: Inverse Gaussian distribution, used for highly skewed non-negative data (e.g., reaction times).

Native Excel Capabilities

Excel is severely limited for GLMs:

  • No Native Functions: Excel has no LOGISTIC or POISSON.REGRESSION function. All GLM fitting requires manual setup of the Likelihood function and using the Solver add-in to maximize it, which is slow, unstable, and complex for users.
  • Linear Bias: Users often force non-normal data into linear models (like predicting counts with LINEST), leading to invalid predictions (e.g., negative counts) and incorrect p-values.

The functions here wrap the robust statsmodels GLM framework, bringing iterative re-weighted least squares (IRLS) optimization directly to the spreadsheet.

Tools

Tool Description
GLM_BINOMIAL Fits a Generalized Linear Model with binomial family for binary or proportion data.
GLM_GAMMA Fit a Generalized Linear Model with Gamma family for positive continuous data.
GLM_INV_GAUSS Fits a Generalized Linear Model with Inverse Gaussian family for right-skewed positive data.
GLM_NEG_BINOM Fits a Generalized Linear Model with Negative Binomial family for overdispersed count data.
GLM_POISSON Fits a Generalized Linear Model with Poisson family for count data.
GLM_TWEEDIE Fits a Generalized Linear Model with Tweedie family for flexible distribution modeling.