KAPLAN_MEIER

Overview

The KAPLAN_MEIER function estimates the survival function from time-to-event data using the Kaplan-Meier estimator, also known as the product-limit estimator. The survival function S(t) = P(T > t) represents the probability that an event (such as death, failure, or recovery) has not yet occurred by time t.

The Kaplan-Meier estimator is a non-parametric statistic introduced by Edward L. Kaplan and Paul Meier in their seminal 1958 paper. It is widely used in medical research to measure patient survival after treatment, in reliability engineering for time-to-failure analysis, and in economics for unemployment duration studies.

This implementation uses the statsmodels library’s SurvfuncRight class, which supports right-censored data—observations where the event has not yet occurred by the end of the study period. The estimator handles censoring by adjusting the risk set at each time point.

The survival probability at time t is computed as:

\hat{S}(t) = \prod_{i: t_i \leq t} \left(1 - \frac{d_i}{n_i}\right)

where t_i represents times when events occurred, d_i is the number of events at time t_i, and n_i is the number of individuals at risk just before time t_i.

The function returns survival estimates at each event time, including the number at risk, events, and censored observations. Confidence intervals are computed using the log-log transformation (Kalbfleisch-Prentice method), which provides better coverage for extreme survival probabilities. The variance is estimated using Greenwood’s formula. Additionally, the function provides the median survival time—the time at which the survival probability crosses 0.5—along with its confidence interval.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=KAPLAN_MEIER(time, event, alpha)
  • time (list[list], required): Column vector of observed times to event or censoring (non-negative).
  • event (list[list], required): Column vector of event indicators (1 = event occurred, 0 = censored).
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (must be between 0 and 1 exclusive).

Returns (list[list]): 2D list with survival estimates, or error message string.

Examples

Example 1: Demo case 1

Inputs:

time event
5 1
10 1
15 1
20 1

Excel formula:

=KAPLAN_MEIER({5;10;15;20}, {1;1;1;1})

Expected output:

time n_risk n_event n_censored survival_prob ci_lower ci_upper
5 4 1 0 0.75 0.1279 0.9605
10 3 1 0 0.5 0.05785 0.8449
15 2 1 0 0.25 0.008948 0.6653
20 1 1 0 0 0 0
Median 15 5 20

Example 2: Demo case 2

Inputs:

time event
6 1
12 0
18 1
24 1
30 0

Excel formula:

=KAPLAN_MEIER({6;12;18;24;30}, {1;0;1;1;0})

Expected output:

time n_risk n_event n_censored survival_prob ci_lower ci_upper
6 5 1 0 0.8 0.2038 0.9692
18 3 1 0 0.5333 0.06833 0.8631
24 2 1 0 0.2667 0.009677 0.6861
Median 24 6 N/A

Example 3: Demo case 3

Inputs:

time event alpha
3 1 0.1
7 1
11 0
14 1

Excel formula:

=KAPLAN_MEIER({3;7;11;14}, {1;1;0;1}, 0.1)

Expected output:

time n_risk n_event n_censored survival_prob ci_lower ci_upper
3 4 1 0 0.75 0.2234 0.9463
7 3 1 0 0.5 0.1033 0.8093
14 1 1 0 0 0 0
Median 14 3 14

Example 4: Demo case 4

Inputs:

time event alpha
2 1 0.05
4 1
6 0
8 1
10 1
12 0

Excel formula:

=KAPLAN_MEIER({2;4;6;8;10;12}, {1;1;0;1;1;0}, 0.05)

Expected output:

time n_risk n_event n_censored survival_prob ci_lower ci_upper
2 6 1 0 0.8333 0.2731 0.9747
4 5 1 0 0.6667 0.1946 0.9044
8 3 1 0 0.4444 0.06619 0.7849
10 2 1 0 0.2222 0.009569 0.6147
Median 8 2 N/A

Python Code

import math
import warnings
from statsmodels.duration.survfunc import SurvfuncRight as statsmodels_SurvfuncRight
from scipy import stats

def kaplan_meier(time, event, alpha=0.05):
    """
    Computes the Kaplan-Meier survival function estimate for time-to-event data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.duration.survfunc.SurvfuncRight.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        time (list[list]): Column vector of observed times to event or censoring (non-negative).
        event (list[list]): Column vector of event indicators (1 = event occurred, 0 = censored).
        alpha (float, optional): Significance level for confidence intervals (must be between 0 and 1 exclusive). Default is 0.05.

    Returns:
        list[list]: 2D list with survival estimates, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def _validate_float(value, name, allow_negative=False):
        if not isinstance(value, (int, float)):
            return f"Invalid input: {name} must be a number."
        val = float(value)
        if math.isnan(val) or math.isinf(val):
            return f"Invalid input: {name} must be finite."
        if not allow_negative and val < 0:
            return f"Invalid input: {name} must be non-negative."
        return val

    def _flatten_column(data, name):
        # Flatten 2D column vector to 1D list
        if not isinstance(data, list):
            return f"Invalid input: {name} must be a 2D list."
        flat = []
        for row in data:
            if not isinstance(row, list):
                return f"Invalid input: {name} must be a 2D list."
            if len(row) != 1:
                return f"Invalid input: {name} must be a column vector (single column)."
            flat.append(row[0])
        return flat

    # Normalize inputs to 2D
    time = to2d(time)
    event = to2d(event)

    # Flatten column vectors
    time_flat = _flatten_column(time, "time")
    if isinstance(time_flat, str):
        return time_flat

    event_flat = _flatten_column(event, "event")
    if isinstance(event_flat, str):
        return event_flat

    # Check equal lengths
    if len(time_flat) != len(event_flat):
        return "Invalid input: time and event must have the same length."

    if len(time_flat) == 0:
        return "Invalid input: time and event must not be empty."

    # Validate time values
    validated_time = []
    for i, t in enumerate(time_flat):
        val_t = _validate_float(t, f"time[{i}]", allow_negative=False)
        if isinstance(val_t, str):
            return val_t
        validated_time.append(val_t)

    # Validate event values (must be 0 or 1)
    validated_event = []
    for i, e in enumerate(event_flat):
        val_e = _validate_float(e, f"event[{i}]", allow_negative=False)
        if isinstance(val_e, str):
            return val_e
        if val_e not in (0, 1):
            return f"Invalid input: event[{i}] must be 0 or 1."
        validated_event.append(int(val_e))

    # Validate alpha
    val_alpha = _validate_float(alpha, "alpha", allow_negative=False)
    if isinstance(val_alpha, str):
        return val_alpha
    if val_alpha <= 0 or val_alpha >= 1:
        return "Invalid input: alpha must be between 0 and 1 (exclusive)."

    # Create SurvfuncRight object
    try:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", RuntimeWarning)
            sf = statsmodels_SurvfuncRight(validated_time, validated_event, title="KM")
    except Exception as exc:
        return f"statsmodels.duration.survfunc.SurvfuncRight error: {exc}"

    # Build output table
    try:
        # Get survival function data
        surv_times = sf.surv_times
        surv_prob = sf.surv_prob
        surv_se = sf.surv_prob_se

        # Calculate z-score for confidence interval
        z = stats.norm.ppf(1 - val_alpha / 2)

        # Calculate n_risk, n_event, n_censored at each time point
        result = [['time', 'n_risk', 'n_event', 'n_censored', 'survival_prob', 'ci_lower', 'ci_upper']]

        for i, t in enumerate(surv_times):
            # Count at risk, events, and censored at this time
            n_risk = sum(1 for j, tt in enumerate(validated_time) if tt >= t)
            n_event = sum(1 for j, (tt, ee) in enumerate(zip(validated_time, validated_event)) if tt == t and ee == 1)
            n_censored = sum(1 for j, (tt, ee) in enumerate(zip(validated_time, validated_event)) if tt == t and ee == 0)

            # Calculate confidence intervals using normal approximation
            s = surv_prob[i]
            se = surv_se[i]

            # Use log-log transformation for better CI (Kalbfleisch-Prentice method)
            if s > 0 and s < 1 and se > 0:
                # Calculate CI on log-log scale
                log_s = math.log(s)
                if log_s < 0:
                    w = z * se / (s * abs(log_s))
                    ci_l = s ** math.exp(w)
                    ci_u = s ** math.exp(-w)
                    # Clamp to [0, 1]
                    ci_l = max(0.0, min(1.0, ci_l))
                    ci_u = max(0.0, min(1.0, ci_u))
                else:
                    # Fallback to plain CI
                    ci_l = max(0.0, s - z * se)
                    ci_u = min(1.0, s + z * se)
            elif s == 1.0:
                ci_l = 1.0
                ci_u = 1.0
            elif s == 0.0:
                ci_l = 0.0
                ci_u = 0.0
            else:
                # Fallback to plain CI
                ci_l = max(0.0, s - z * se)
                ci_u = min(1.0, s + z * se)

            result.append([
                float(t),
                float(n_risk),
                float(n_event),
                float(n_censored),
                float(s),
                float(ci_l),
                float(ci_u)
            ])

        # Add median survival time row
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", RuntimeWarning)
            median_surv = sf.quantile(0.5)
            median_ci_lower = sf.quantile_ci(0.5, alpha=val_alpha)[0]
            median_ci_upper = sf.quantile_ci(0.5, alpha=val_alpha)[1]

        # Handle potential NaN/inf in median values
        median_val = float(median_surv) if not (math.isnan(median_surv) or math.isinf(median_surv)) else 'N/A'
        median_ci_l = float(median_ci_lower) if not (math.isnan(median_ci_lower) or math.isinf(median_ci_lower)) else 'N/A'
        median_ci_u = float(median_ci_upper) if not (math.isnan(median_ci_upper) or math.isinf(median_ci_upper)) else 'N/A'

        result.append([
            'Median',
            "",
            "",
            "",
            median_val,
            median_ci_l,
            median_ci_u
        ])

    except Exception as exc:
        return f"Error computing Kaplan-Meier estimates: {exc}"

    return result

Online Calculator