EXP_SURVIVAL_REG

Overview

The EXP_SURVIVAL_REG function fits a parametric exponential survival regression model to analyze time-to-event data with covariates. This approach is commonly used in survival analysis to study the relationship between predictor variables and the time until an event occurs, such as equipment failure, patient recovery, or customer churn.

The exponential survival model assumes a constant hazard rate over time, making it the simplest parametric survival distribution. This memoryless property means that the probability of an event occurring in the next time interval is independent of how much time has already elapsed. The exponential distribution is characterized by a single rate parameter \lambda, with the survival function:

S(t) = e^{-\lambda t}

and hazard function h(t) = \lambda (constant).

This implementation leverages statsmodels Generalized Linear Models (GLM) framework with a Poisson family and log link. By using the logarithm of survival time as an offset term, the GLM-Poisson approach provides an efficient method for fitting exponential survival regression. The model estimates log-hazard coefficients:

\log(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}

where \lambda_i is the hazard rate for observation i and x_{ij} are the covariate values.

The function handles right-censored data, where some observations have not yet experienced the event by the end of the study (indicated by event = 0). Output includes coefficient estimates, standard errors, z-statistics, p-values, confidence intervals at the specified significance level, and hazard ratios (exponentiated coefficients) which represent the multiplicative change in hazard for a one-unit increase in the covariate. Model fit statistics including log-likelihood, AIC, and BIC are also provided.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=EXP_SURVIVAL_REG(time, event, x, fit_intercept, alpha)
  • time (list[list], required): Time to event or censoring (positive values).
  • event (list[list], required): Event indicator (1 = event occurred, 0 = censored).
  • x (list[list], required): Covariates (predictor variables) matrix.
  • fit_intercept (bool, optional, default: true): If True, includes intercept in the model.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (0 to 1).

Returns (list[list]): 2D list with survival regression, or error message string.

Examples

Example 1: Demo case 1

Inputs:

time event x
5 1 1
8 1 2
12 0 3
15 1 2.5
20 1 4

Excel formula:

=EXP_SURVIVAL_REG({5;8;12;15;20}, {1;1;0;1;1}, {1;2;3;2.5;4})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
intercept -1.138 1.344 -0.8471 0.3969 -3.771 1.495 0.3204
x1 -0.5937 0.5251 -1.131 0.2582 -1.623 0.4354 0.5523
log_likelihood -4.8
aic 13.6
bic 12.82
scale_parameter 1

Example 2: Demo case 2

Inputs:

time event x fit_intercept
10 1 2 false
15 0 3
20 1 4
25 1 5

Excel formula:

=EXP_SURVIVAL_REG({10;15;20;25}, {1;0;1;1}, {2;3;4;5}, FALSE)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -0.8671 0.1685 -5.147 2.647e-7 -1.197 -0.5369 0.4202
log_likelihood -4.85
aic 11.7
bic 11.09
scale_parameter 1

Example 3: Demo case 3

Inputs:

time event x alpha
6 1 1.5 0.1
9 1 2.2
14 1 3.1
18 0 2.8
22 1 3.9

Excel formula:

=EXP_SURVIVAL_REG({6;9;14;18;22}, {1;1;1;0;1}, {1.5;2.2;3.1;2.8;3.9}, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
intercept -1.163 1.793 -0.6484 0.5167 -4.113 1.787 0.3126
x1 -0.5911 0.6439 -0.9181 0.3586 -1.65 0.4679 0.5537
log_likelihood -5.258
aic 14.52
bic 13.73
scale_parameter 1

Example 4: Demo case 4

Inputs:

time event x fit_intercept alpha
7 1 1 0.5 true 0.05
11 1 2 1
16 0 3 1.5
19 1 2.5 1.2
24 1 4 2

Excel formula:

=EXP_SURVIVAL_REG({7;11;16;19;24}, {1;1;0;1;1}, {1,0.5;2,1;3,1.5;2.5,1.2;4,2}, TRUE, 0.05)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
intercept -1.517 1.35 -1.124 0.2612 -4.163 1.129 0.2194
x1 -1.259 11.55 -0.109 0.9132 -23.9 21.38 0.2838
x2 1.434 23.16 0.06191 0.9506 -43.96 46.83 4.195
log_likelihood -4.825
aic 15.65
bic 14.48
scale_parameter 1

Python Code

import math
import numpy as np
import statsmodels.api as sm_api

def exp_survival_reg(time, event, x, fit_intercept=True, alpha=0.05):
    """
    Fits a parametric exponential survival regression model.

    See: https://www.statsmodels.org/stable/glm.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        time (list[list]): Time to event or censoring (positive values).
        event (list[list]): Event indicator (1 = event occurred, 0 = censored).
        x (list[list]): Covariates (predictor variables) matrix.
        fit_intercept (bool, optional): If True, includes intercept in the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (0 to 1). Default is 0.05.

    Returns:
        list[list]: 2D list with survival regression, or error message string.
    """
    def to2d(arr):
        return [[arr]] if not isinstance(arr, list) else arr

    def validate_numeric(value, name):
        if not isinstance(value, (int, float, bool)):
            return f"Invalid input: {name} must be a number."
        num = float(value)
        if math.isnan(num) or math.isinf(num):
            return f"Invalid input: {name} must be finite."
        return num

    def validate_2d_array(arr, name, expected_cols=None):
        arr_2d = to2d(arr)
        if not isinstance(arr_2d, list):
            return f"Invalid input: {name} must be a 2D list."
        if len(arr_2d) == 0:
            return f"Invalid input: {name} must not be empty."

        for i, row in enumerate(arr_2d):
            if not isinstance(row, list):
                return f"Invalid input: {name} row {i} must be a list."
            if expected_cols is not None and len(row) != expected_cols:
                return f"Invalid input: {name} must have {expected_cols} columns."

        n_cols = len(arr_2d[0])
        for i, row in enumerate(arr_2d):
            if len(row) != n_cols:
                return f"Invalid input: {name} must have consistent column count."

        flat_values = []
        for row in arr_2d:
            for val in row:
                validated = validate_numeric(val, f"{name} element")
                if isinstance(validated, str):
                    return validated
                flat_values.append(validated)

        return np.array(flat_values).reshape(len(arr_2d), n_cols)

    # Validate inputs
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    alpha_val = validate_numeric(alpha, "alpha")
    if isinstance(alpha_val, str):
        return alpha_val
    if alpha_val <= 0 or alpha_val >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    time_arr = validate_2d_array(time, "time", expected_cols=1)
    if isinstance(time_arr, str):
        return time_arr

    event_arr = validate_2d_array(event, "event", expected_cols=1)
    if isinstance(event_arr, str):
        return event_arr

    x_arr = validate_2d_array(x, "x")
    if isinstance(x_arr, str):
        return x_arr

    # Check dimensions match
    n_obs = time_arr.shape[0]
    if event_arr.shape[0] != n_obs:
        return "Invalid input: time and event must have same number of rows."
    if x_arr.shape[0] != n_obs:
        return "Invalid input: x must have same number of rows as time."

    # Validate time values are non-negative and non-zero
    time_flat = time_arr.flatten()
    if np.any(time_flat <= 0):
        return "Invalid input: time values must be positive."

    # Validate event values are 0 or 1
    event_flat = event_arr.flatten()
    if not np.all((event_flat == 0) | (event_flat == 1)):
        return "Invalid input: event values must be 0 or 1."

    # Check for at least one event
    if np.sum(event_flat) == 0:
        return "Invalid input: at least one event must occur (event = 1)."

    # Prepare data for GLM with Poisson family
    # For exponential regression: log hazard = intercept + beta * x
    # Using GLM with Poisson family and log(time) as offset
    n_covariates = x_arr.shape[1]

    # Add intercept if requested
    if fit_intercept:
        exog = sm_api.add_constant(x_arr, prepend=True)
        param_names = ['intercept'] + [f"x{i+1}" for i in range(n_covariates)]
    else:
        exog = x_arr
        param_names = [f"x{i+1}" for i in range(n_covariates)]

    try:
        # Fit GLM with Poisson family and log(time) as offset
        # This is equivalent to exponential survival regression
        model = sm_api.GLM(
            endog=event_flat,
            exog=exog,
            family=sm_api.families.Poisson(),
            offset=np.log(time_flat)
        )
        result = model.fit()

    except Exception as exc:
        return f"statsmodels error: {str(exc)}"

    # Extract results
    coefficients = result.params
    std_errors = result.bse
    z_stats = result.tvalues
    p_values = result.pvalues

    # Get confidence intervals
    conf_int = result.conf_int(alpha=alpha_val)

    # Calculate hazard ratios (exp of coefficients)
    hazard_ratios = np.exp(coefficients)

    # Build result table
    headers = ['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'hazard_ratio']
    result_table = [headers]

    # Add parameter rows
    for i, param_name in enumerate(param_names):
        row = [
            param_name,
            float(coefficients[i]),
            float(std_errors[i]),
            float(z_stats[i]),
            float(p_values[i]),
            float(conf_int[i, 0]),
            float(conf_int[i, 1]),
            float(hazard_ratios[i])
        ]
        result_table.append(row)

    # Add model statistics
    log_likelihood = float(result.llf)
    aic = float(result.aic)
    bic = float(result.bic_llf)

    # For exponential model, scale parameter is 1 (constant hazard)
    scale_param = 1.0

    result_table.append(['log_likelihood', log_likelihood, "", "", "", "", "", ""])
    result_table.append(['aic', aic, "", "", "", "", "", ""])
    result_table.append(['bic', bic, "", "", "", "", "", ""])
    result_table.append(['scale_parameter', scale_param, "", "", "", "", "", ""])

    return result_table

Online Calculator