Survival Analysis
Overview
Survival Analysis (or Time-to-Event analysis) involves analyzing the duration of time until one or more events happen (e.g., failure of a machine, death of a patient, customer churn).
The defining characteristic of survival data is censoring: some subjects may not have experienced the event by the end of the study. We know they “survived” at least up to time T, but we don’t know the exact time of the event. Standard linear regression cannot handle this.
Key Methods
- KAPLAN_MEIER: Kaplan-Meier estimator. A non-parametric statistic used to estimate the survival function S(t) from lifetime data. It produces the iconic “step function” survival curve.
- COX_HAZARDS: Cox Proportional Hazards Model. A semi-parametric regression model that relates predictors (e.g., age, treatment) to the hazard rate. It assumes that predictors have a multiplicative effect on the hazard.
- EXP_SURVIVAL_REG: Parametric survival regression assuming the data follows an Exponential distribution (constant hazard rate).
Native Excel Capabilities
Excel has no native ability to perform survival analysis. - No Kaplan-Meier: Creating a Kaplan-Meier curve requires complex manual calculations of risk sets at every time point. - No Cox Model: Estimating Cox Partial Likelihood is impossible with standard formulas and difficult with Solver. - Workarounds: Users often resort to simply ignoring censored data (biased) or treating “Time to Event” as a standard regression outcome, which is statistically invalid.
This library integrates the Lifelines (or statsmodels) logic, enabling proper handling of censored data directly in Excel.
Tools
| Tool | Description |
|---|---|
| COX_HAZARDS | Fits a Cox Proportional Hazards regression model for survival data. |
| EXP_SURVIVAL_REG | Fits a parametric exponential survival regression model. |
| KAPLAN_MEIER | Computes the Kaplan-Meier survival function estimate for time-to-event data. |