KAPLAN_MEIER
Overview
The KAPLAN_MEIER function estimates the survival function from time-to-event data using the Kaplan-Meier estimator, also known as the product-limit estimator. The survival function S(t) = P(T > t) represents the probability that an event (such as death, failure, or recovery) has not yet occurred by time t.
The Kaplan-Meier estimator is a non-parametric statistic introduced by Edward L. Kaplan and Paul Meier in their seminal 1958 paper. It is widely used in medical research to measure patient survival after treatment, in reliability engineering for time-to-failure analysis, and in economics for unemployment duration studies.
This implementation uses the statsmodels library’s SurvfuncRight class, which supports right-censored data—observations where the event has not yet occurred by the end of the study period. The estimator handles censoring by adjusting the risk set at each time point.
The survival probability at time t is computed as:
\hat{S}(t) = \prod_{i: t_i \leq t} \left(1 - \frac{d_i}{n_i}\right)
where t_i represents times when events occurred, d_i is the number of events at time t_i, and n_i is the number of individuals at risk just before time t_i.
The function returns survival estimates at each event time, including the number at risk, events, and censored observations. Confidence intervals are computed using the log-log transformation (Kalbfleisch-Prentice method), which provides better coverage for extreme survival probabilities. The variance is estimated using Greenwood’s formula. Additionally, the function provides the median survival time—the time at which the survival probability crosses 0.5—along with its confidence interval.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=KAPLAN_MEIER(time, event, alpha)
time(list[list], required): Column vector of observed times to event or censoring (non-negative).event(list[list], required): Column vector of event indicators (1 = event occurred, 0 = censored).alpha(float, optional, default: 0.05): Significance level for confidence intervals (must be between 0 and 1 exclusive).
Returns (list[list]): 2D list with survival estimates, or error message string.
Examples
Example 1: Demo case 1
Inputs:
| time | event |
|---|---|
| 5 | 1 |
| 10 | 1 |
| 15 | 1 |
| 20 | 1 |
Excel formula:
=KAPLAN_MEIER({5;10;15;20}, {1;1;1;1})
Expected output:
| time | n_risk | n_event | n_censored | survival_prob | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| 5 | 4 | 1 | 0 | 0.75 | 0.1279 | 0.9605 |
| 10 | 3 | 1 | 0 | 0.5 | 0.05785 | 0.8449 |
| 15 | 2 | 1 | 0 | 0.25 | 0.008948 | 0.6653 |
| 20 | 1 | 1 | 0 | 0 | 0 | 0 |
| Median | 15 | 5 | 20 |
Example 2: Demo case 2
Inputs:
| time | event |
|---|---|
| 6 | 1 |
| 12 | 0 |
| 18 | 1 |
| 24 | 1 |
| 30 | 0 |
Excel formula:
=KAPLAN_MEIER({6;12;18;24;30}, {1;0;1;1;0})
Expected output:
| time | n_risk | n_event | n_censored | survival_prob | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| 6 | 5 | 1 | 0 | 0.8 | 0.2038 | 0.9692 |
| 18 | 3 | 1 | 0 | 0.5333 | 0.06833 | 0.8631 |
| 24 | 2 | 1 | 0 | 0.2667 | 0.009677 | 0.6861 |
| Median | 24 | 6 | N/A |
Example 3: Demo case 3
Inputs:
| time | event | alpha |
|---|---|---|
| 3 | 1 | 0.1 |
| 7 | 1 | |
| 11 | 0 | |
| 14 | 1 |
Excel formula:
=KAPLAN_MEIER({3;7;11;14}, {1;1;0;1}, 0.1)
Expected output:
| time | n_risk | n_event | n_censored | survival_prob | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| 3 | 4 | 1 | 0 | 0.75 | 0.2234 | 0.9463 |
| 7 | 3 | 1 | 0 | 0.5 | 0.1033 | 0.8093 |
| 14 | 1 | 1 | 0 | 0 | 0 | 0 |
| Median | 14 | 3 | 14 |
Example 4: Demo case 4
Inputs:
| time | event | alpha |
|---|---|---|
| 2 | 1 | 0.05 |
| 4 | 1 | |
| 6 | 0 | |
| 8 | 1 | |
| 10 | 1 | |
| 12 | 0 |
Excel formula:
=KAPLAN_MEIER({2;4;6;8;10;12}, {1;1;0;1;1;0}, 0.05)
Expected output:
| time | n_risk | n_event | n_censored | survival_prob | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| 2 | 6 | 1 | 0 | 0.8333 | 0.2731 | 0.9747 |
| 4 | 5 | 1 | 0 | 0.6667 | 0.1946 | 0.9044 |
| 8 | 3 | 1 | 0 | 0.4444 | 0.06619 | 0.7849 |
| 10 | 2 | 1 | 0 | 0.2222 | 0.009569 | 0.6147 |
| Median | 8 | 2 | N/A |
Python Code
import math
import warnings
from statsmodels.duration.survfunc import SurvfuncRight as statsmodels_SurvfuncRight
from scipy import stats
def kaplan_meier(time, event, alpha=0.05):
"""
Computes the Kaplan-Meier survival function estimate for time-to-event data.
See: https://www.statsmodels.org/stable/generated/statsmodels.duration.survfunc.SurvfuncRight.html
This example function is provided as-is without any representation of accuracy.
Args:
time (list[list]): Column vector of observed times to event or censoring (non-negative).
event (list[list]): Column vector of event indicators (1 = event occurred, 0 = censored).
alpha (float, optional): Significance level for confidence intervals (must be between 0 and 1 exclusive). Default is 0.05.
Returns:
list[list]: 2D list with survival estimates, or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
def _validate_float(value, name, allow_negative=False):
if not isinstance(value, (int, float)):
return f"Invalid input: {name} must be a number."
val = float(value)
if math.isnan(val) or math.isinf(val):
return f"Invalid input: {name} must be finite."
if not allow_negative and val < 0:
return f"Invalid input: {name} must be non-negative."
return val
def _flatten_column(data, name):
# Flatten 2D column vector to 1D list
if not isinstance(data, list):
return f"Invalid input: {name} must be a 2D list."
flat = []
for row in data:
if not isinstance(row, list):
return f"Invalid input: {name} must be a 2D list."
if len(row) != 1:
return f"Invalid input: {name} must be a column vector (single column)."
flat.append(row[0])
return flat
# Normalize inputs to 2D
time = to2d(time)
event = to2d(event)
# Flatten column vectors
time_flat = _flatten_column(time, "time")
if isinstance(time_flat, str):
return time_flat
event_flat = _flatten_column(event, "event")
if isinstance(event_flat, str):
return event_flat
# Check equal lengths
if len(time_flat) != len(event_flat):
return "Invalid input: time and event must have the same length."
if len(time_flat) == 0:
return "Invalid input: time and event must not be empty."
# Validate time values
validated_time = []
for i, t in enumerate(time_flat):
val_t = _validate_float(t, f"time[{i}]", allow_negative=False)
if isinstance(val_t, str):
return val_t
validated_time.append(val_t)
# Validate event values (must be 0 or 1)
validated_event = []
for i, e in enumerate(event_flat):
val_e = _validate_float(e, f"event[{i}]", allow_negative=False)
if isinstance(val_e, str):
return val_e
if val_e not in (0, 1):
return f"Invalid input: event[{i}] must be 0 or 1."
validated_event.append(int(val_e))
# Validate alpha
val_alpha = _validate_float(alpha, "alpha", allow_negative=False)
if isinstance(val_alpha, str):
return val_alpha
if val_alpha <= 0 or val_alpha >= 1:
return "Invalid input: alpha must be between 0 and 1 (exclusive)."
# Create SurvfuncRight object
try:
with warnings.catch_warnings():
warnings.simplefilter("ignore", RuntimeWarning)
sf = statsmodels_SurvfuncRight(validated_time, validated_event, title="KM")
except Exception as exc:
return f"statsmodels.duration.survfunc.SurvfuncRight error: {exc}"
# Build output table
try:
# Get survival function data
surv_times = sf.surv_times
surv_prob = sf.surv_prob
surv_se = sf.surv_prob_se
# Calculate z-score for confidence interval
z = stats.norm.ppf(1 - val_alpha / 2)
# Calculate n_risk, n_event, n_censored at each time point
result = [['time', 'n_risk', 'n_event', 'n_censored', 'survival_prob', 'ci_lower', 'ci_upper']]
for i, t in enumerate(surv_times):
# Count at risk, events, and censored at this time
n_risk = sum(1 for j, tt in enumerate(validated_time) if tt >= t)
n_event = sum(1 for j, (tt, ee) in enumerate(zip(validated_time, validated_event)) if tt == t and ee == 1)
n_censored = sum(1 for j, (tt, ee) in enumerate(zip(validated_time, validated_event)) if tt == t and ee == 0)
# Calculate confidence intervals using normal approximation
s = surv_prob[i]
se = surv_se[i]
# Use log-log transformation for better CI (Kalbfleisch-Prentice method)
if s > 0 and s < 1 and se > 0:
# Calculate CI on log-log scale
log_s = math.log(s)
if log_s < 0:
w = z * se / (s * abs(log_s))
ci_l = s ** math.exp(w)
ci_u = s ** math.exp(-w)
# Clamp to [0, 1]
ci_l = max(0.0, min(1.0, ci_l))
ci_u = max(0.0, min(1.0, ci_u))
else:
# Fallback to plain CI
ci_l = max(0.0, s - z * se)
ci_u = min(1.0, s + z * se)
elif s == 1.0:
ci_l = 1.0
ci_u = 1.0
elif s == 0.0:
ci_l = 0.0
ci_u = 0.0
else:
# Fallback to plain CI
ci_l = max(0.0, s - z * se)
ci_u = min(1.0, s + z * se)
result.append([
float(t),
float(n_risk),
float(n_event),
float(n_censored),
float(s),
float(ci_l),
float(ci_u)
])
# Add median survival time row
with warnings.catch_warnings():
warnings.simplefilter("ignore", RuntimeWarning)
median_surv = sf.quantile(0.5)
median_ci_lower = sf.quantile_ci(0.5, alpha=val_alpha)[0]
median_ci_upper = sf.quantile_ci(0.5, alpha=val_alpha)[1]
# Handle potential NaN/inf in median values
median_val = float(median_surv) if not (math.isnan(median_surv) or math.isinf(median_surv)) else 'N/A'
median_ci_l = float(median_ci_lower) if not (math.isnan(median_ci_lower) or math.isinf(median_ci_lower)) else 'N/A'
median_ci_u = float(median_ci_upper) if not (math.isnan(median_ci_upper) or math.isinf(median_ci_upper)) else 'N/A'
result.append([
'Median',
"",
"",
"",
median_val,
median_ci_l,
median_ci_u
])
except Exception as exc:
return f"Error computing Kaplan-Meier estimates: {exc}"
return result