COX_HAZARDS

Overview

The COX_HAZARDS function fits a Cox Proportional Hazards regression model for survival analysis, enabling the analysis of time-to-event data while accounting for censored observations. Introduced by Sir David Cox in 1972, this semi-parametric model estimates the effect of covariates on the hazard rate without requiring specification of the baseline hazard function.

The Cox model assumes that covariates have a multiplicative effect on the hazard function, expressed as:

\lambda(t|X_i) = \lambda_0(t) \exp(X_i \cdot \beta)

where \lambda_0(t) is the baseline hazard at time t, X_i represents the covariate vector for subject i, and \beta are the regression coefficients. The model is called “proportional hazards” because the ratio of hazards between any two subjects is constant over time, depending only on their covariate values.

This implementation uses the PHReg class from statsmodels, which maximizes the partial likelihood using either the Breslow or Efron method to handle tied event times. The function returns regression coefficients, standard errors, z-statistics, p-values, confidence intervals, and hazard ratios (the exponentiated coefficients). Hazard ratios represent the multiplicative change in the hazard for a one-unit increase in a covariate.

Cox models are widely used in medical research, reliability engineering, and econometrics to model survival times, failure times, or duration until an event occurs. The model handles right-censored data, where some subjects have not experienced the event by the end of the observation period. For more details on the statistical theory, see the Wikipedia article on proportional hazards models and the statsmodels duration analysis documentation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=COX_HAZARDS(time, event, x, fit_intercept, alpha)
  • time (list[list], required): Column vector of non-negative time values representing time to event or censoring.
  • event (list[list], required): Column vector of event indicators where 1 indicates event occurred and 0 indicates censored.
  • x (list[list], required): Matrix of covariates (predictors) where each column is a different predictor.
  • fit_intercept (bool, optional, default: false): Whether to fit an intercept term. Cox model typically does not include an intercept.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals. Must be between 0 and 1.

Returns (list[list]): 2D list with Cox regression results

Example 1: Cox model with one covariate

Inputs:

time event x
5 1 1
10 1 2
15 0 1.5
20 1 2.5
25 1 1.8
30 0 2.2
35 1 1.2
40 1 2.8

Excel formula:

=COX_HAZARDS({5;10;15;20;25;30;35;40}, {1;1;0;1;1;0;1;1}, {1;2;1.5;2.5;1.8;2.2;1.2;2.8})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -0.821472 0.793844 -1.0348 0.300761 -2.37738 0.734434 0.439784
log_likelihood -7.12667
Example 2: Cox model with two covariates (alpha 0.1)

Inputs:

time event x alpha
6 1 1 50 0.1
13 1 2 45
18 1 1.5 55
23 0 2.5 48
30 1 1.2 52
35 1 2.2 46
42 1 1.8 49
50 0 2 51

Excel formula:

=COX_HAZARDS({6;13;18;23;30;35;42;50}, {1;1;1;0;1;1;1;0}, {1,50;2,45;1.5,55;2.5,48;1.2,52;2.2,46;1.8,49;2,51}, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -4.02289 2.08216 -1.93207 0.0533505 -7.44774 -0.598037 0.0179012
x2 -0.430254 0.320071 -1.34425 0.178869 -0.956724 0.0962155 0.650344
log_likelihood -6.21153
Example 3: Cox model with larger sample

Inputs:

time event x fit_intercept alpha
8 1 1.2 false 0.05
12 0 1.8
16 1 1.5
22 1 2.1
28 1 1.6
32 1 2.3
38 0 1.4
45 1 2.5
50 1 1.9
55 1 2.7

Excel formula:

=COX_HAZARDS({8;12;16;22;28;32;38;45;50;55}, {1;0;1;1;1;1;0;1;1;1}, {1.2;1.8;1.5;2.1;1.6;2.3;1.4;2.5;1.9;2.7}, FALSE, 0.05)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -1.40816 0.959752 -1.46722 0.142317 -3.28924 0.472915 0.244592
log_likelihood -10.3194
Example 4: Cox model with varied times

Inputs:

time event x
3 1 0.8
7 1 1.2
11 1 1.5
15 1 0.9
19 0 1.1
24 1 1.4
29 1 1
33 0 1.3
38 1 1.6

Excel formula:

=COX_HAZARDS({3;7;11;15;19;24;29;33;38}, {1;1;1;1;0;1;1;0;1}, {0.8;1.2;1.5;0.9;1.1;1.4;1;1.3;1.6})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper hazard_ratio
x1 -2.313 1.87076 -1.2364 0.216311 -5.97962 1.35362 0.0989639
log_likelihood -9.65981

Python Code

Show Code
import math
import pandas as pd
import numpy as np
from statsmodels.duration.hazard_regression import PHReg as statsmodels_phreg

def cox_hazards(time, event, x, fit_intercept=False, alpha=0.05):
    """
    Fits a Cox Proportional Hazards regression model for survival data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.duration.hazard_regression.PHReg.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        time (list[list]): Column vector of non-negative time values representing time to event or censoring.
        event (list[list]): Column vector of event indicators where 1 indicates event occurred and 0 indicates censored.
        x (list[list]): Matrix of covariates (predictors) where each column is a different predictor.
        fit_intercept (bool, optional): Whether to fit an intercept term. Cox model typically does not include an intercept. Default is False.
        alpha (float, optional): Significance level for confidence intervals. Must be between 0 and 1. Default is 0.05.

    Returns:
        list[list]: 2D list with Cox regression results
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def validate_numeric_2d(arr, name, allow_negative=True):
        # Validates a 2D list and converts to numpy array
        arr_2d = to2d(arr)
        if not isinstance(arr_2d, list):
            return f"Error: Invalid input: {name} must be a 2D list."
        if not all(isinstance(row, list) for row in arr_2d):
            return f"Error: Invalid input: {name} must be a 2D list."
        try:
            flat = []
            for row in arr_2d:
                for val in row:
                    num = float(val)
                    if math.isnan(num) or math.isinf(num):
                        return f"Error: Invalid input: {name} contains non-finite values."
                    if not allow_negative and num < 0:
                        return f"Error: Invalid input: {name} must be non-negative."
                    flat.append(num)
            arr_np = np.array(arr_2d, dtype=float)
            return arr_np
        except Exception:
            return f"Error: Invalid input: {name} must contain numeric values."

    def validate_scalar_numeric(val, name, allow_negative=True):
        # Validates a scalar numeric value
        try:
            num = float(val)
        except Exception:
            return f"Error: Invalid input: {name} must be a number."
        if math.isnan(num) or math.isinf(num):
            return f"Error: Invalid input: {name} must be finite."
        if not allow_negative and num < 0:
            return f"Error: Invalid input: {name} must be non-negative."
        return num

    try:
        # Validate time
        time_arr = validate_numeric_2d(time, "time", allow_negative=False)
        if isinstance(time_arr, str):
            return time_arr
        if time_arr.ndim != 2 or time_arr.shape[1] != 1:
            return "Error: Invalid input: time must be a column vector (2D list with one column)."
        time_vec = time_arr.flatten()

        # Validate event
        event_arr = validate_numeric_2d(event, "event", allow_negative=False)
        if isinstance(event_arr, str):
            return event_arr
        if event_arr.ndim != 2 or event_arr.shape[1] != 1:
            return "Error: Invalid input: event must be a column vector (2D list with one column)."
        event_vec = event_arr.flatten()

        # Check time and event have same length
        if len(time_vec) != len(event_vec):
            return "Error: Invalid input: time and event must have the same length."

        # Validate event values are 0 or 1
        if not all((e == 0 or e == 1) for e in event_vec):
            return "Error: Invalid input: event values must be 0 (censored) or 1 (event occurred)."

        # Validate x
        x_arr = validate_numeric_2d(x, "x")
        if isinstance(x_arr, str):
            return x_arr
        if x_arr.ndim != 2:
            return "Error: Invalid input: x must be a 2D matrix."
        if x_arr.shape[0] != len(time_vec):
            return "Error: Invalid input: x must have the same number of rows as time and event."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: Invalid input: fit_intercept must be a boolean."

        # Validate alpha
        alpha_val = validate_scalar_numeric(alpha, "alpha", allow_negative=False)
        if isinstance(alpha_val, str):
            return alpha_val
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        # Prepare data for statsmodels
        n_covariates = x_arr.shape[1]

        # Create column names for covariates
        covariate_names = [f"x{i+1}" for i in range(n_covariates)]

        # Create DataFrame with covariates
        data_dict = {name: x_arr[:, i] for i, name in enumerate(covariate_names)}
        data_dict["time"] = time_vec
        data_dict["event"] = event_vec
        df = pd.DataFrame(data_dict)

        # Add intercept if requested
        if fit_intercept:
            df["intercept"] = 1.0
            covariate_names = ["intercept"] + covariate_names

        # Fit Cox model
        try:
            # PHReg expects time as endog and status as a separate parameter
            endog = df["time"]
            exog = df[covariate_names]
            status = df["event"]
            model = statsmodels_phreg(endog, exog, status=status)
            result = model.fit()
        except Exception as exc:
            return f"Error: statsmodels.duration.hazard_regression.PHReg error: {exc}"

        # Extract results
        try:
            # Result attributes may be Series or arrays depending on statsmodels version
            coefficients = result.params if isinstance(result.params, np.ndarray) else result.params.values
            std_errors = result.bse if isinstance(result.bse, np.ndarray) else result.bse.values
            z_statistics = result.tvalues if isinstance(result.tvalues, np.ndarray) else result.tvalues.values
            p_values = result.pvalues if isinstance(result.pvalues, np.ndarray) else result.pvalues.values
            conf_int = result.conf_int(alpha=alpha_val)
            if isinstance(conf_int, np.ndarray):
                ci_lower = conf_int[:, 0]
                ci_upper = conf_int[:, 1]
            else:
                ci_lower = conf_int.iloc[:, 0].values
                ci_upper = conf_int.iloc[:, 1].values
            hazard_ratios = np.exp(coefficients)

            # Model statistics
            log_likelihood = result.llf

        except Exception as exc:
            return f"Error: Failed to extract results: {exc}"

        # Build output table
        output = [["parameter", "coefficient", "std_error", "z_statistic", "p_value", "ci_lower", "ci_upper", "hazard_ratio"]]

        for i, param_name in enumerate(covariate_names):
            output.append([
                param_name,
                float(coefficients[i]),
                float(std_errors[i]),
                float(z_statistics[i]),
                float(p_values[i]),
                float(ci_lower[i]),
                float(ci_upper[i]),
                float(hazard_ratios[i])
            ])

        # Add model statistics
        output.append(["log_likelihood", float(log_likelihood), "", "", "", "", "", ""])

        return output
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Column vector of non-negative time values representing time to event or censoring.
Column vector of event indicators where 1 indicates event occurred and 0 indicates censored.
Matrix of covariates (predictors) where each column is a different predictor.
Whether to fit an intercept term. Cox model typically does not include an intercept.
Significance level for confidence intervals. Must be between 0 and 1.