Overview
Statistical models are mathematical representations of the process that generates data. They allow us to estimate relationships between variables, predict future outcomes, and understand causal mechanisms. Unlike simple summary statistics, models can account for confounding variables and quantify uncertainty in predictions.
Key Sub-Disciplines
- Regression: The cornerstone of modeling. Includes OLS (Ordinary Least Squares) for continuous outcomes and WLS for weighted data.
- Generalized Linear Models (GLM): Extends linear regression to non-normal response variables.
- Logistic Regression (LOGIT): For binary outcomes (Yes/No).
- Poisson Regression: For count data.
- Discrete Choice: Models for choosing between categorical options (e.g., MNLOGIT for multinomial choices).
- Survival Analysis: Models the time until an event occurs (e.g., KAPLAN_MEIER).
- Mixed Effects: Handles hierarchical or clustered data (e.g., students within schools) by including random effects.
Native Excel Capabilities
Excel’s modeling capabilities are primarily limited to: - Linear Regression: LINEST and Analysis ToolPak handle simple and multiple linear regression well. - Visualization: Trendlines in charts are useful for quick checks.
However, significant gaps exist: - No Native GLM: Excel has no built-in function for Logistic Regression or Poisson Regression, forcing users to use Solver manually or buy add-ins. - No Survival Analysis: Kaplan-Meier curves and Cox Proportional Hazards models are widely used in medicine/engineering but absent in Excel. - No Mixed Effects: Analyzing repeated measures or nested data is not possible with standard formulas. - Python Power: The functions here wrap statsmodels, estimating complex models with proper standard errors, p-values, and diagnostic statistics directly in the spreadsheet.
Count
| HURDLE_COUNT_MODEL |
Fits a Hurdle model for count data with two-stage process (zero vs. positive counts). |
| ZINB_MODEL |
Fits a Zero-Inflated Negative Binomial (ZINB) model for overdispersed count data with excess zeros. |
| ZIP_MODEL |
Fits a Zero-Inflated Poisson (ZIP) model for count data with excess zeros. |
Discrete Choice
| LOGIT_MODEL |
Fits a binary logistic regression model to predict binary outcomes using maximum likelihood estimation. |
| MULTINOMIAL_LOGIT |
Fits a multinomial logistic regression model for multi-category outcomes. |
| ORDERED_LOGIT |
Fits an ordered logistic regression model for ordinal outcomes. |
| PROBIT_MODEL |
Fits a binary probit regression model using maximum likelihood estimation. |
Generalized Linear
| GLM_BINOMIAL |
Fits a Generalized Linear Model with binomial family for binary or proportion data. |
| GLM_GAMMA |
Fit a Generalized Linear Model with Gamma family for positive continuous data. |
| GLM_INV_GAUSS |
Fits a Generalized Linear Model with Inverse Gaussian family for right-skewed positive data. |
| GLM_NEG_BINOM |
Fits a Generalized Linear Model with Negative Binomial family for overdispersed count data. |
| GLM_POISSON |
Fits a Generalized Linear Model with Poisson family for count data. |
| GLM_TWEEDIE |
Fits a Generalized Linear Model with Tweedie family for flexible distribution modeling. |
Mixed Effects
| GEE_MODEL |
Fits a Generalized Estimating Equations (GEE) model for correlated data. |
| GLMM_BINOMIAL |
Fits a Generalized Linear Mixed Model (GLMM) with binomial family for binary clustered data. |
| GLMM_POISSON |
Fits a Generalized Linear Mixed Model (GLMM) with Poisson family for count clustered data. |
| MIXED_LINEAR_MODEL |
Fits a Linear Mixed Effects Model (LMM) with random intercepts and slopes. |
Regression
| GLS_REGRESSION |
Fits a Generalized Least Squares (GLS) regression model. |
| INFLUENCE_DIAG |
Computes regression influence diagnostics for identifying influential observations. |
| OLS_DIAGNOSTICS |
Performs diagnostic tests on OLS regression residuals. |
| OLS_REGRESSION |
Fits an Ordinary Least Squares (OLS) regression model. |
| QUANTILE_REGRESSION |
Fits a quantile regression model to estimate conditional quantiles of the response distribution. |
| REGRESS_DIAG |
Performs comprehensive regression diagnostic tests. |
| ROBUST_LINEAR_MODEL |
Fits a robust linear regression model using M-estimators. |
| SPECIFICATION_TESTS |
Performs regression specification tests to detect model misspecification. |
| WLS_REGRESSION |
Fits a Weighted Least Squares (WLS) regression model. |
Survival
| COX_HAZARDS |
Fits a Cox Proportional Hazards regression model for survival data. |
| EXP_SURVIVAL_REG |
Fits a parametric exponential survival regression model. |
| KAPLAN_MEIER |
Computes the Kaplan-Meier survival function estimate for time-to-event data. |