COX_HAZARDS

Overview

The COX_HAZARDS function fits a Cox Proportional Hazards regression model for survival analysis, enabling the analysis of time-to-event data while accounting for censored observations. Introduced by Sir David Cox in 1972, this semi-parametric model estimates the effect of covariates on the hazard rate without requiring specification of the baseline hazard function.

The Cox model assumes that covariates have a multiplicative effect on the hazard function, expressed as:

\lambda(t|X_i) = \lambda_0(t) \exp(X_i \cdot \beta)

where \lambda_0(t) is the baseline hazard at time t, X_i represents the covariate vector for subject i, and \beta are the regression coefficients. The model is called “proportional hazards” because the ratio of hazards between any two subjects is constant over time, depending only on their covariate values.

This implementation uses the PHReg class from statsmodels, which maximizes the partial likelihood using either the Breslow or Efron method to handle tied event times. The function returns regression coefficients, standard errors, z-statistics, p-values, confidence intervals, and hazard ratios (the exponentiated coefficients). Hazard ratios represent the multiplicative change in the hazard for a one-unit increase in a covariate.

Cox models are widely used in medical research, reliability engineering, and econometrics to model survival times, failure times, or duration until an event occurs. The model handles right-censored data, where some subjects have not experienced the event by the end of the observation period. For more details on the statistical theory, see the Wikipedia article on proportional hazards models and the statsmodels duration analysis documentation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=COX_HAZARDS(time, event, x, fit_intercept, alpha)
  • time (list[list], required): Column vector of non-negative time values representing time to event or censoring.
  • event (list[list], required): Column vector of event indicators where 1 indicates event occurred and 0 indicates censored.
  • x (list[list], required): Matrix of covariates (predictors) where each column is a different predictor.
  • fit_intercept (bool, optional, default: false): Whether to fit an intercept term. Cox model typically does not include an intercept.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals. Must be between 0 and 1.

Returns (list[list]): 2D list with Cox regression results, or error message string.

Examples

Example 1: Demo case 1

Inputs:

time event x
5 1 1
10 1 2
15 0 1.5
20 1 2.5
25 1 1.8
30 0 2.2
35 1 1.2
40 1 2.8

Excel formula:

=COX_HAZARDS({5;10;15;20;25;30;35;40}, {1;1;0;1;1;0;1;1}, {1;2;1.5;2.5;1.8;2.2;1.2;2.8})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -0.8215 0.7938 -1.035 0.3008 -2.377 0.7344 0.4398
log_likelihood -7.127

Example 2: Demo case 2

Inputs:

time event x alpha
6 1 1 50 0.1
13 1 2 45
18 1 1.5 55
23 0 2.5 48
30 1 1.2 52
35 1 2.2 46
42 1 1.8 49
50 0 2 51

Excel formula:

=COX_HAZARDS({6;13;18;23;30;35;42;50}, {1;1;1;0;1;1;1;0}, {1,50;2,45;1.5,55;2.5,48;1.2,52;2.2,46;1.8,49;2,51}, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -4.023 2.082 -1.932 0.05335 -7.448 -0.598 0.0179
x2 -0.4303 0.3201 -1.344 0.1789 -0.9567 0.09622 0.6503
log_likelihood -6.212

Example 3: Demo case 3

Inputs:

time event x fit_intercept alpha
8 1 1.2 false 0.05
12 0 1.8
16 1 1.5
22 1 2.1
28 1 1.6
32 1 2.3
38 0 1.4
45 1 2.5
50 1 1.9
55 1 2.7

Excel formula:

=COX_HAZARDS({8;12;16;22;28;32;38;45;50;55}, {1;0;1;1;1;1;0;1;1;1}, {1.2;1.8;1.5;2.1;1.6;2.3;1.4;2.5;1.9;2.7}, FALSE, 0.05)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -1.408 0.9598 -1.467 0.1423 -3.289 0.4729 0.2446
log_likelihood -10.32

Example 4: Demo case 4

Inputs:

time event x
3 1 0.8
7 1 1.2
11 1 1.5
15 1 0.9
19 0 1.1
24 1 1.4
29 1 1
33 0 1.3
38 1 1.6

Excel formula:

=COX_HAZARDS({3;7;11;15;19;24;29;33;38}, {1;1;1;1;0;1;1;0;1}, {0.8;1.2;1.5;0.9;1.1;1.4;1;1.3;1.6})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -2.313 1.871 -1.236 0.2163 -5.98 1.354 0.09896
log_likelihood -9.66

Python Code

import math
import pandas as pd
import numpy as np
from statsmodels.duration.hazard_regression import PHReg as statsmodels_phreg

def cox_hazards(time, event, x, fit_intercept=False, alpha=0.05):
    """
    Fits a Cox Proportional Hazards regression model for survival data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.duration.hazard_regression.PHReg.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        time (list[list]): Column vector of non-negative time values representing time to event or censoring.
        event (list[list]): Column vector of event indicators where 1 indicates event occurred and 0 indicates censored.
        x (list[list]): Matrix of covariates (predictors) where each column is a different predictor.
        fit_intercept (bool, optional): Whether to fit an intercept term. Cox model typically does not include an intercept. Default is False.
        alpha (float, optional): Significance level for confidence intervals. Must be between 0 and 1. Default is 0.05.

    Returns:
        list[list]: 2D list with Cox regression results, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def validate_numeric_2d(arr, name, allow_negative=True):
        # Validates a 2D list and converts to numpy array
        arr_2d = to2d(arr)
        if not isinstance(arr_2d, list):
            return f"Invalid input: {name} must be a 2D list."
        if not all(isinstance(row, list) for row in arr_2d):
            return f"Invalid input: {name} must be a 2D list."
        try:
            flat = []
            for row in arr_2d:
                for val in row:
                    num = float(val)
                    if math.isnan(num) or math.isinf(num):
                        return f"Invalid input: {name} contains non-finite values."
                    if not allow_negative and num < 0:
                        return f"Invalid input: {name} must be non-negative."
                    flat.append(num)
            arr_np = np.array(arr_2d, dtype=float)
            return arr_np
        except Exception:
            return f"Invalid input: {name} must contain numeric values."

    def validate_scalar_numeric(val, name, allow_negative=True):
        # Validates a scalar numeric value
        try:
            num = float(val)
        except Exception:
            return f"Invalid input: {name} must be a number."
        if math.isnan(num) or math.isinf(num):
            return f"Invalid input: {name} must be finite."
        if not allow_negative and num < 0:
            return f"Invalid input: {name} must be non-negative."
        return num

    # Validate time
    time_arr = validate_numeric_2d(time, "time", allow_negative=False)
    if isinstance(time_arr, str):
        return time_arr
    if time_arr.ndim != 2 or time_arr.shape[1] != 1:
        return "Invalid input: time must be a column vector (2D list with one column)."
    time_vec = time_arr.flatten()

    # Validate event
    event_arr = validate_numeric_2d(event, "event", allow_negative=False)
    if isinstance(event_arr, str):
        return event_arr
    if event_arr.ndim != 2 or event_arr.shape[1] != 1:
        return "Invalid input: event must be a column vector (2D list with one column)."
    event_vec = event_arr.flatten()

    # Check time and event have same length
    if len(time_vec) != len(event_vec):
        return "Invalid input: time and event must have the same length."

    # Validate event values are 0 or 1
    if not all((e == 0 or e == 1) for e in event_vec):
        return "Invalid input: event values must be 0 (censored) or 1 (event occurred)."

    # Validate x
    x_arr = validate_numeric_2d(x, "x")
    if isinstance(x_arr, str):
        return x_arr
    if x_arr.ndim != 2:
        return "Invalid input: x must be a 2D matrix."
    if x_arr.shape[0] != len(time_vec):
        return "Invalid input: x must have the same number of rows as time and event."

    # Validate fit_intercept
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    # Validate alpha
    alpha_val = validate_scalar_numeric(alpha, "alpha", allow_negative=False)
    if isinstance(alpha_val, str):
        return alpha_val
    if alpha_val <= 0 or alpha_val >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Prepare data for statsmodels
    n_obs = len(time_vec)
    n_covariates = x_arr.shape[1]

    # Create column names for covariates
    covariate_names = [f"x{i+1}" for i in range(n_covariates)]

    # Create DataFrame with covariates
    data_dict = {name: x_arr[:, i] for i, name in enumerate(covariate_names)}
    data_dict["time"] = time_vec
    data_dict["event"] = event_vec
    df = pd.DataFrame(data_dict)

    # Add intercept if requested
    if fit_intercept:
        df["intercept"] = 1.0
        covariate_names = ["intercept"] + covariate_names

    # Fit Cox model
    try:
        # PHReg expects time as endog and status as a separate parameter
        endog = df["time"]
        exog = df[covariate_names]
        status = df["event"]
        model = statsmodels_phreg(endog, exog, status=status)
        result = model.fit()
    except Exception as exc:
        return f"statsmodels.duration.hazard_regression.PHReg error: {exc}"

    # Extract results
    try:
        # Result attributes may be Series or arrays depending on statsmodels version
        coefficients = result.params if isinstance(result.params, np.ndarray) else result.params.values
        std_errors = result.bse if isinstance(result.bse, np.ndarray) else result.bse.values
        z_statistics = result.tvalues if isinstance(result.tvalues, np.ndarray) else result.tvalues.values
        p_values = result.pvalues if isinstance(result.pvalues, np.ndarray) else result.pvalues.values
        conf_int = result.conf_int(alpha=alpha_val)
        if isinstance(conf_int, np.ndarray):
            ci_lower = conf_int[:, 0]
            ci_upper = conf_int[:, 1]
        else:
            ci_lower = conf_int.iloc[:, 0].values
            ci_upper = conf_int.iloc[:, 1].values
        hazard_ratios = np.exp(coefficients)

        # Model statistics
        log_likelihood = result.llf

    except Exception as exc:
        return f"Error extracting results: {exc}"

    # Build output table
    output = [["parameter", "coefficient", "std_error", "z_statistic", "p_value", "ci_lower", "ci_upper", "hazard_ratio"]]

    for i, param_name in enumerate(covariate_names):
        output.append([
            param_name,
            float(coefficients[i]),
            float(std_errors[i]),
            float(z_statistics[i]),
            float(p_values[i]),
            float(ci_lower[i]),
            float(ci_upper[i]),
            float(hazard_ratios[i])
        ])

    # Add model statistics
    output.append(["log_likelihood", float(log_likelihood), "", "", "", "", "", ""])

    return output

Online Calculator