EXP_SURVIVAL_REG

Overview

The EXP_SURVIVAL_REG function fits a parametric exponential survival regression model to analyze time-to-event data with covariates. This approach is commonly used in survival analysis to study the relationship between predictor variables and the time until an event occurs, such as equipment failure, patient recovery, or customer churn.

The exponential survival model assumes a constant hazard rate over time, making it the simplest parametric survival distribution. This memoryless property means that the probability of an event occurring in the next time interval is independent of how much time has already elapsed. The exponential distribution is characterized by a single rate parameter \lambda, with the survival function:

S(t) = e^{-\lambda t}

and hazard function h(t) = \lambda (constant).

This implementation leverages statsmodels Generalized Linear Models (GLM) framework with a Poisson family and log link. By using the logarithm of survival time as an offset term, the GLM-Poisson approach provides an efficient method for fitting exponential survival regression. The model estimates log-hazard coefficients:

\log(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}

where \lambda_i is the hazard rate for observation i and x_{ij} are the covariate values.

The function handles right-censored data, where some observations have not yet experienced the event by the end of the study (indicated by event = 0). Output includes coefficient estimates, standard errors, z-statistics, p-values, confidence intervals at the specified significance level, and hazard ratios (exponentiated coefficients) which represent the multiplicative change in hazard for a one-unit increase in the covariate. Model fit statistics including log-likelihood, AIC, and BIC are also provided.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=EXP_SURVIVAL_REG(time, event, x, fit_intercept, alpha)
  • time (list[list], required): Time to event or censoring (positive values).
  • event (list[list], required): Event indicator (1 = event occurred, 0 = censored).
  • x (list[list], required): Covariates (predictor variables) matrix.
  • fit_intercept (bool, optional, default: true): If True, includes intercept in the model.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (0 to 1).

Returns (list[list]): 2D list with survival regression

Example 1: Exponential regression with intercept

Inputs:

time event x
5 1 1
8 1 2
12 0 3
15 1 2.5
20 1 4

Excel formula:

=EXP_SURVIVAL_REG({5;8;12;15;20}, {1;1;0;1;1}, {1;2;3;2.5;4})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
intercept -1.13812 1.34355 -0.847104 0.396937 -3.77143 1.49518 0.320419
x1 -0.593701 0.525072 -1.13071 0.258179 -1.62282 0.43542 0.552279
log_likelihood -4.8
aic 13.6
bic 12.8189
scale_parameter 1
Example 2: Exponential regression without intercept

Inputs:

time event x fit_intercept
10 1 2 false
15 0 3
20 1 4
25 1 5

Excel formula:

=EXP_SURVIVAL_REG({10;15;20;25}, {1;0;1;1}, {2;3;4;5}, FALSE)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -0.867073 0.168462 -5.147 2.6468e-7 -1.19725 -0.536894 0.420179
log_likelihood -4.8497
aic 11.6994
bic 11.0857
scale_parameter 1
Example 3: Exponential regression with alpha 0.1

Inputs:

time event x alpha
6 1 1.5 0.1
9 1 2.2
14 1 3.1
18 0 2.8
22 1 3.9

Excel formula:

=EXP_SURVIVAL_REG({6;9;14;18;22}, {1;1;1;0;1}, {1.5;2.2;3.1;2.8;3.9}, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
intercept -1.16284 1.79349 -0.648371 0.516745 -4.11287 1.78718 0.312596
x1 -0.59115 0.64388 -0.918106 0.358563 -1.65024 0.467938 0.55369
log_likelihood -5.2576
aic 14.5152
bic 13.7341
scale_parameter 1
Example 4: Exponential regression with two covariates

Inputs:

time event x fit_intercept alpha
7 1 1 0.5 true 0.05
11 1 2 1
16 0 3 1.5
19 1 2.5 1.2
24 1 4 2

Excel formula:

=EXP_SURVIVAL_REG({7;11;16;19;24}, {1;1;0;1;1}, {1,0.5;2,1;3,1.5;2.5,1.2;4,2}, TRUE, 0.05)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
intercept -1.51681 1.35007 -1.12351 0.261222 -4.1629 1.12927 0.21941
x1 -1.25933 11.5523 -0.109011 0.913194 -23.9013 21.3827 0.283845
x2 1.43391 23.1597 0.0619138 0.950631 -43.9583 46.8261 4.19506
log_likelihood -4.82519
aic 15.6504
bic 14.4787
scale_parameter 1

Python Code

Show Code
import math
import numpy as np
import statsmodels.api as sm_api

def exp_survival_reg(time, event, x, fit_intercept=True, alpha=0.05):
    """
    Fits a parametric exponential survival regression model.

    See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        time (list[list]): Time to event or censoring (positive values).
        event (list[list]): Event indicator (1 = event occurred, 0 = censored).
        x (list[list]): Covariates (predictor variables) matrix.
        fit_intercept (bool, optional): If True, includes intercept in the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (0 to 1). Default is 0.05.

    Returns:
        list[list]: 2D list with survival regression
    """
    def to2d(arr):
        return [[arr]] if not isinstance(arr, list) else arr

    def validate_numeric(value, name):
        if not isinstance(value, (int, float, bool)):
            return f"Error: Invalid input: {name} must be a number."
        num = float(value)
        if math.isnan(num) or math.isinf(num):
            return f"Error: Invalid input: {name} must be finite."
        return num

    def validate_2d_array(arr, name, expected_cols=None):
        arr_2d = to2d(arr)
        if not isinstance(arr_2d, list):
            return f"Error: Invalid input: {name} must be a 2D list."
        if len(arr_2d) == 0:
            return f"Error: Invalid input: {name} must not be empty."

        for i, row in enumerate(arr_2d):
            if not isinstance(row, list):
                return f"Error: Invalid input: {name} row {i} must be a list."
            if expected_cols is not None and len(row) != expected_cols:
                return f"Error: Invalid input: {name} must have {expected_cols} columns."

        n_cols = len(arr_2d[0])
        for i, row in enumerate(arr_2d):
            if len(row) != n_cols:
                return f"Error: Invalid input: {name} must have consistent column count."

        flat_values = []
        for row in arr_2d:
            for val in row:
                validated = validate_numeric(val, f"{name} element")
                if isinstance(validated, str):
                    return validated
                flat_values.append(validated)

        return np.array(flat_values).reshape(len(arr_2d), n_cols)

    try:
        # Validate inputs
        if not isinstance(fit_intercept, bool):
            return "Error: Invalid input: fit_intercept must be a boolean."

        alpha_val = validate_numeric(alpha, "alpha")
        if isinstance(alpha_val, str):
            return alpha_val
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        time_arr = validate_2d_array(time, "time", expected_cols=1)
        if isinstance(time_arr, str):
            return time_arr

        event_arr = validate_2d_array(event, "event", expected_cols=1)
        if isinstance(event_arr, str):
            return event_arr

        x_arr = validate_2d_array(x, "x")
        if isinstance(x_arr, str):
            return x_arr

        # Check dimensions match
        n_obs = time_arr.shape[0]
        if event_arr.shape[0] != n_obs:
            return "Error: Invalid input: time and event must have same number of rows."
        if x_arr.shape[0] != n_obs:
            return "Error: Invalid input: x must have same number of rows as time."

        # Validate time values are non-negative and non-zero
        time_flat = time_arr.flatten()
        if np.any(time_flat <= 0):
            return "Error: Invalid input: time values must be positive."

        # Validate event values are 0 or 1
        event_flat = event_arr.flatten()
        if not np.all((event_flat == 0) | (event_flat == 1)):
            return "Error: Invalid input: event values must be 0 or 1."

        # Check for at least one event
        if np.sum(event_flat) == 0:
            return "Error: Invalid input: at least one event must occur (event = 1)."

        # Prepare data for GLM with Poisson family
        # For exponential regression: log hazard = intercept + beta * x
        # Using GLM with Poisson family and log(time) as offset
        n_covariates = x_arr.shape[1]

        # Add intercept if requested
        if fit_intercept:
            exog = sm_api.add_constant(x_arr, prepend=True)
            param_names = ['intercept'] + [f"x{i+1}" for i in range(n_covariates)]
        else:
            exog = x_arr
            param_names = [f"x{i+1}" for i in range(n_covariates)]

        try:
            # Fit GLM with Poisson family and log(time) as offset
            # This is equivalent to exponential survival regression
            model = sm_api.GLM(
                endog=event_flat,
                exog=exog,
                family=sm_api.families.Poisson(),
                offset=np.log(time_flat)
            )
            result = model.fit()

        except Exception as exc:
            return f"Error: statsmodels error: {str(exc)}"

        # Extract results
        coefficients = result.params
        std_errors = result.bse
        z_stats = result.tvalues
        p_values = result.pvalues

        # Get confidence intervals
        conf_int = result.conf_int(alpha=alpha_val)

        # Calculate hazard ratios (exp of coefficients)
        hazard_ratios = np.exp(coefficients)

        # Build result table
        headers = ['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'hazard_ratio']
        result_table = [headers]

        # Add parameter rows
        for i, param_name in enumerate(param_names):
            row = [
                param_name,
                float(coefficients[i]),
                float(std_errors[i]),
                float(z_stats[i]),
                float(p_values[i]),
                float(conf_int[i, 0]),
                float(conf_int[i, 1]),
                float(hazard_ratios[i])
            ]
            result_table.append(row)

        # Add model statistics
        log_likelihood = float(result.llf)
        aic = float(result.aic)
        bic = float(result.bic_llf)

        # For exponential model, scale parameter is 1 (constant hazard)
        scale_param = 1.0

        result_table.append(['log_likelihood', log_likelihood, "", "", "", "", "", ""])
        result_table.append(['aic', aic, "", "", "", "", "", ""])
        result_table.append(['bic', bic, "", "", "", "", "", ""])
        result_table.append(['scale_parameter', scale_param, "", "", "", "", "", ""])

        return result_table
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Time to event or censoring (positive values).
Event indicator (1 = event occurred, 0 = censored).
Covariates (predictor variables) matrix.
If True, includes intercept in the model.
Significance level for confidence intervals (0 to 1).