LOGIT_MODEL

Overview

The LOGIT_MODEL function fits a binary logistic regression model to predict binary outcomes (0 or 1) using maximum likelihood estimation (MLE). Logistic regression is one of the most widely used statistical methods for binary classification, commonly applied in credit scoring, medical diagnosis, marketing response prediction, and many other domains where the outcome variable is dichotomous.

This implementation uses the statsmodels library, specifically the Logit class from the discrete choice models module. For more background on discrete regression models, see the statsmodels documentation on regression with discrete dependent variables.

The logistic regression model relates the probability of the binary outcome to predictor variables through the logistic (sigmoid) function:

P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k)}}

The model parameters \beta are estimated by maximizing the log-likelihood function. The function returns coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals for each predictor. It also computes odds ratios (e^\beta), which represent the multiplicative change in odds for a one-unit increase in the corresponding predictor.

Model fit is assessed using several statistics: the pseudo R-squared (McFadden’s R²), which compares the fitted model to a null model; the log-likelihood value; AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) for model comparison; and the likelihood ratio test p-value for overall model significance.

Key references for logistic regression and discrete choice models include Cameron and Trivedi’s Regression Analysis of Count Data (1998), Maddala’s Limited-Dependent and Qualitative Variables in Econometrics (1983), and Greene’s Econometric Analysis (2003). The source code is available on the statsmodels GitHub repository.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=LOGIT_MODEL(y, x, fit_intercept, alpha)
  • y (list[list], required): Binary dependent variable (0 or 1) as a column vector
  • x (list[list], required): Independent variables (predictors) as a matrix where each column is a predictor
  • fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)

Returns (list[list]): 2D list with logit results and statistics, or error string.

Examples

Example 1: Demo case 1

Inputs:

y x
0 1
0 1.5
0 2
0 2.5
0 3
1 3.5
0 4
1 4.5
1 5
1 5.5

Excel formula:

=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper odds_ratio
intercept -9.721 6.424 -1.513 0.1302 -22.31 2.87 0.00006003
x1 2.591 1.69 1.533 0.1253 -0.7218 5.904 13.34
pseudo_r_squared 0.6275
log_likelihood -2.507
aic 9.014
bic 9.619
llr_pvalue 0.003658

Example 2: Demo case 2

Inputs:

y x
0 1 5
0 1.5 4.5
0 2 6
0 1.8 5.5
0 2.2 4
0 1.2 6.5
1 3 7
0 2.5 5
1 3.5 8
0 2.8 6
1 4 9
1 4.2 8.5
1 3.8 9.5
1 4.5 10
0 3.2 7.5
1 5 11
1 5.2 10.5
1 4.8 11.5
1 5.5 12
1 5.8 12.5

Excel formula:

=LOGIT_MODEL({0;0;0;0;0;0;1;0;1;0;1;1;1;1;0;1;1;1;1;1}, {1,5;1.5,4.5;2,6;1.8,5.5;2.2,4;1.2,6.5;3,7;2.5,5;3.5,8;2.8,6;4,9;4.2,8.5;3.8,9.5;4.5,10;3.2,7.5;5,11;5.2,10.5;4.8,11.5;5.5,12;5.8,12.5})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper odds_ratio
intercept -17.23 11.24 -1.532 0.1254 -39.27 4.808 3.287e-8
x1 3.174 7.915 0.401 0.6884 -12.34 18.69 23.9
x2 1.023 3.361 0.3044 0.7608 -5.564 7.61 2.782
pseudo_r_squared 0.8311
log_likelihood -2.324
aic 10.65
bic 13.64
llr_pvalue 0.00001077

Example 3: Demo case 3

Inputs:

y x fit_intercept
0 1 false
0 1.5
0 2
0 2.5
0 3
1 3.5
0 4
1 4.5
1 5
1 5.5

Excel formula:

=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5}, FALSE)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper odds_ratio
x1 0.07189 0.1803 0.3988 0.69 -0.2814 0.4252 1.075
pseudo_r_squared -0.01795
log_likelihood -6.851
aic 15.7
bic 16
llr_pvalue

Example 4: Demo case 4

Inputs:

y x alpha
0 1 0.1
0 1.5
0 2
0 2.5
0 3
1 3.5
0 4
1 4.5
1 5
1 5.5

Excel formula:

=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5}, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper odds_ratio
intercept -9.721 6.424 -1.513 0.1302 -20.29 0.8461 0.00006003
x1 2.591 1.69 1.533 0.1253 -0.1892 5.371 13.34
pseudo_r_squared 0.6275
log_likelihood -2.507
aic 9.014
bic 9.619
llr_pvalue 0.003658

Python Code

import math
import numpy as np
from statsmodels.discrete.discrete_model import Logit as statsmodels_logit

def logit_model(y, x, fit_intercept=True, alpha=0.05):
    """
    Fits a binary logistic regression model to predict binary outcomes using maximum likelihood estimation.

    See: https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.Logit.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Binary dependent variable (0 or 1) as a column vector
        x (list[list]): Independent variables (predictors) as a matrix where each column is a predictor
        fit_intercept (bool, optional): If true, adds an intercept term to the model Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.

    Returns:
        list[list]: 2D list with logit results and statistics, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    # Normalize inputs to 2D lists
    y_2d = to2d(y)
    x_2d = to2d(x)

    # Validate y (must be a column vector)
    if not isinstance(y_2d, list) or not all(isinstance(row, list) for row in y_2d):
        return "Error: Invalid input: y must be a 2D list (column vector)."

    if len(y_2d) == 0:
        return "Error: Invalid input: y must not be empty."

    # Extract y values (column vector)
    y_values = []
    for row in y_2d:
        if len(row) != 1:
            return "Error: Invalid input: y must be a column vector with one element per row."
        try:
            val = float(row[0])
            if math.isnan(val) or math.isinf(val):
                return "Error: Invalid input: y values must be finite."
            if val not in [0.0, 1.0]:
                return "Error: Invalid input: y values must be 0 or 1 for binary logistic regression."
            y_values.append(val)
        except (TypeError, ValueError):
            return "Error: Invalid input: y values must be numeric."

    n_obs = len(y_values)

    # Validate x (must be a matrix)
    if not isinstance(x_2d, list) or not all(isinstance(row, list) for row in x_2d):
        return "Error: Invalid input: x must be a 2D list (matrix)."

    if len(x_2d) != n_obs:
        return f"Error: Invalid input: x must have {n_obs} rows to match y."

    # Extract x values (matrix)
    x_values = []
    n_predictors = None
    for i, row in enumerate(x_2d):
        if n_predictors is None:
            n_predictors = len(row)
        elif len(row) != n_predictors:
            return "Error: Invalid input: all rows of x must have the same number of columns."

        row_vals = []
        for val in row:
            try:
                num_val = float(val)
                if math.isnan(num_val) or math.isinf(num_val):
                    return "Error: Invalid input: x values must be finite."
                row_vals.append(num_val)
            except (TypeError, ValueError):
                return "Error: Invalid input: x values must be numeric."
        x_values.append(row_vals)

    # Validate n_obs vs n_params
    num_params = n_predictors + (1 if fit_intercept else 0)
    if n_obs < num_params:
        return "Error: Number of observations must be greater than or equal to number of parameters."

    # Validate alpha
    try:
        alpha_val = float(alpha)
        if math.isnan(alpha_val) or math.isinf(alpha_val):
            return "Error: Invalid input: alpha must be finite."
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."
    except (TypeError, ValueError):
        return "Error: Invalid input: alpha must be numeric."

    # Convert to numpy arrays
    y_array = np.array(y_values)
    x_array = np.array(x_values)

    # Add intercept if requested
    if fit_intercept:
        x_array = np.column_stack([np.ones(n_obs), x_array])

    # Fit the logistic regression model
    try:
        model = statsmodels_logit(y_array, x_array)
        result = model.fit(disp=0)
    except Exception as e:
        return f"Error: statsmodels.discrete.discrete_model.Logit error: {e}"

    # Extract results
    params = result.params
    std_errors = result.bse
    z_stats = result.tvalues
    p_values = result.pvalues
    conf_int = result.conf_int(alpha=alpha_val)
    odds_ratios = np.exp(params)

    # Build output table
    output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'odds_ratio']]

    # Add parameter rows
    for i in range(len(params)):
        if fit_intercept and i == 0:
            param_name = 'intercept'
        else:
            predictor_idx = i if not fit_intercept else i - 1
            param_name = f'x{predictor_idx + 1}'

        output.append([
            param_name,
            float(params[i]),
            float(std_errors[i]),
            float(z_stats[i]),
            float(p_values[i]),
            float(conf_int[i, 0]),
            float(conf_int[i, 1]),
            float(odds_ratios[i])
        ])

    # Add model statistics
    # Handle NaN values by converting to empty string
    def safe_float(value):
        try:
            f_val = float(value)
            if math.isnan(f_val):
                return ''
            return f_val
        except (TypeError, ValueError):
            return ''

    output.append(['pseudo_r_squared', safe_float(result.prsquared), '', '', '', '', '', ''])
    output.append(['log_likelihood', safe_float(result.llf), '', '', '', '', '', ''])
    output.append(['aic', safe_float(result.aic), '', '', '', '', '', ''])
    output.append(['bic', safe_float(result.bic), '', '', '', '', '', ''])
    output.append(['llr_pvalue', safe_float(result.llr_pvalue), '', '', '', '', '', ''])

    return output

Online Calculator