LOGIT_MODEL

Overview

The LOGIT_MODEL function fits a binary logistic regression model to predict binary outcomes (0 or 1) using maximum likelihood estimation (MLE). Logistic regression is one of the most widely used statistical methods for binary classification, commonly applied in credit scoring, medical diagnosis, marketing response prediction, and many other domains where the outcome variable is dichotomous.

This implementation uses the statsmodels library, specifically the Logit class from the discrete choice models module. For more background on discrete regression models, see the statsmodels documentation on regression with discrete dependent variables.

The logistic regression model relates the probability of the binary outcome to predictor variables through the logistic (sigmoid) function:

P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k)}}

The model parameters \beta are estimated by maximizing the log-likelihood function. The function returns coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals for each predictor. It also computes odds ratios (e^\beta), which represent the multiplicative change in odds for a one-unit increase in the corresponding predictor.

Model fit is assessed using several statistics: the pseudo R-squared (McFadden’s R²), which compares the fitted model to a null model; the log-likelihood value; AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) for model comparison; and the likelihood ratio test p-value for overall model significance.

Key references for logistic regression and discrete choice models include Cameron and Trivedi’s Regression Analysis of Count Data (1998), Maddala’s Limited-Dependent and Qualitative Variables in Econometrics (1983), and Greene’s Econometric Analysis (2003). The source code is available on the statsmodels GitHub repository.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=LOGIT_MODEL(y, x, fit_intercept, alpha)

y (list[list], required): Binary dependent variable (0 or 1) as a column vector
x (list[list], required): Independent variables (predictors) as a matrix where each column is a predictor
fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model
alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)

Returns (list[list]): 2D list with logit results and statistics, or error string.

Example 1: Binary logit with single predictor

Inputs:

y	x
0	1
0	1.5
0	2
0	2.5
0	3
1	3.5
0	4
1	4.5
1	5
1	5.5

Excel formula:

=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5})

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper	odds_ratio
intercept	-9.72064	6.42414	-1.51314	0.130243	-22.3117	2.87043	0.0000600313
x1	2.59087	1.69018	1.53289	0.125302	-0.721827	5.90358	13.3414
pseudo_r_squared	0.627511
log_likelihood	-2.5069
aic	9.01379
bic	9.61896
llr_pvalue	0.00365758

Example 2: Binary logit with multiple predictors

Inputs:

y	x
0	1	5
0	1.5	4.5
0	2	6
0	1.8	5.5
0	2.2	4
0	1.2	6.5
1	3	7
0	2.5	5
1	3.5	8
0	2.8	6
1	4	9
1	4.2	8.5
1	3.8	9.5
1	4.5	10
0	3.2	7.5
1	5	11
1	5.2	10.5
1	4.8	11.5
1	5.5	12
1	5.8	12.5

Excel formula:

=LOGIT_MODEL({0;0;0;0;0;0;1;0;1;0;1;1;1;1;0;1;1;1;1;1}, {1,5;1.5,4.5;2,6;1.8,5.5;2.2,4;1.2,6.5;3,7;2.5,5;3.5,8;2.8,6;4,9;4.2,8.5;3.8,9.5;4.5,10;3.2,7.5;5,11;5.2,10.5;4.8,11.5;5.5,12;5.8,12.5})

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper	odds_ratio
intercept	-17.2308	11.2443	-1.53241	0.125422	-39.2692	4.80757	3.28668e-8
x1	3.174	7.91493	0.401014	0.68841	-12.339	18.687	23.9028
x2	1.02309	3.36071	0.304428	0.760802	-5.56377	7.60995	2.78178
pseudo_r_squared	0.831129
log_likelihood	-2.32413
aic	10.6483
bic	13.6355
llr_pvalue	0.0000107711

Example 3: Binary logit without intercept

Inputs:

y	x	fit_intercept
0	1	false
0	1.5
0	2
0	2.5
0	3
1	3.5
0	4
1	4.5
1	5
1	5.5

Excel formula:

=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5}, FALSE)

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper	odds_ratio
x1	0.0718926	0.180271	0.398803	0.690039	-0.281432	0.425217	1.07454
pseudo_r_squared	-0.0179519
log_likelihood	-6.85093
aic	15.7019
bic	16.0045
llr_pvalue

Example 4: Binary logit with custom confidence level

Inputs:

y	x	alpha
0	1	0.1
0	1.5
0	2
0	2.5
0	3
1	3.5
0	4
1	4.5
1	5
1	5.5

Excel formula:

=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5}, 0.1)

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper	odds_ratio
intercept	-9.72064	6.42414	-1.51314	0.130243	-20.2874	0.846121	0.0000600313
x1	2.59087	1.69018	1.53289	0.125302	-0.189233	5.37098	13.3414
pseudo_r_squared	0.627511
log_likelihood	-2.5069
aic	9.01379
bic	9.61896
llr_pvalue	0.00365758

Python Code

Show Code

import math
import numpy as np
from statsmodels.discrete.discrete_model import Logit as statsmodels_logit

def logit_model(y, x, fit_intercept=True, alpha=0.05):
    """
    Fits a binary logistic regression model to predict binary outcomes using maximum likelihood estimation.

    See: https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.Logit.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Binary dependent variable (0 or 1) as a column vector
        x (list[list]): Independent variables (predictors) as a matrix where each column is a predictor
        fit_intercept (bool, optional): If true, adds an intercept term to the model Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.

    Returns:
        list[list]: 2D list with logit results and statistics, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    try:
      # Normalize inputs to 2D lists
      y_2d = to2d(y)
      x_2d = to2d(x)

      # Validate y (must be a column vector)
      if not isinstance(y_2d, list) or not all(isinstance(row, list) for row in y_2d):
        return "Error: Invalid input: y must be a 2D list (column vector)."

      if len(y_2d) == 0:
        return "Error: Invalid input: y must not be empty."

      # Extract y values (column vector)
      y_values = []
      for row in y_2d:
        if len(row) != 1:
          return "Error: Invalid input: y must be a column vector with one element per row."
        try:
          val = float(row[0])
          if math.isnan(val) or math.isinf(val):
            return "Error: Invalid input: y values must be finite."
          if val not in [0.0, 1.0]:
            return "Error: Invalid input: y values must be 0 or 1 for binary logistic regression."
          y_values.append(val)
        except (TypeError, ValueError):
          return "Error: Invalid input: y values must be numeric."

      n_obs = len(y_values)

      # Validate x (must be a matrix)
      if not isinstance(x_2d, list) or not all(isinstance(row, list) for row in x_2d):
        return "Error: Invalid input: x must be a 2D list (matrix)."

      if len(x_2d) != n_obs:
        return f"Error: Invalid input: x must have {n_obs} rows to match y."

      # Extract x values (matrix)
      x_values = []
      n_predictors = None
      for i, row in enumerate(x_2d):
        if n_predictors is None:
          n_predictors = len(row)
        elif len(row) != n_predictors:
          return "Error: Invalid input: all rows of x must have the same number of columns."

        row_vals = []
        for val in row:
          try:
            num_val = float(val)
            if math.isnan(num_val) or math.isinf(num_val):
              return "Error: Invalid input: x values must be finite."
            row_vals.append(num_val)
          except (TypeError, ValueError):
            return "Error: Invalid input: x values must be numeric."
        x_values.append(row_vals)

      # Validate n_obs vs n_params
      num_params = n_predictors + (1 if fit_intercept else 0)
      if n_obs < num_params:
        return "Error: Number of observations must be greater than or equal to number of parameters."

      # Validate alpha
      try:
        alpha_val = float(alpha)
        if math.isnan(alpha_val) or math.isinf(alpha_val):
          return "Error: Invalid input: alpha must be finite."
        if alpha_val <= 0 or alpha_val >= 1:
          return "Error: Invalid input: alpha must be between 0 and 1."
      except (TypeError, ValueError):
        return "Error: Invalid input: alpha must be numeric."

      # Convert to numpy arrays
      y_array = np.array(y_values)
      x_array = np.array(x_values)

      # Add intercept if requested
      if fit_intercept:
        x_array = np.column_stack([np.ones(n_obs), x_array])

      # Fit the logistic regression model
      try:
        model = statsmodels_logit(y_array, x_array)
        result = model.fit(disp=0)
      except Exception as e:
        return f"Error: statsmodels.discrete.discrete_model.Logit error: {e}"

      # Extract results
      params = result.params
      std_errors = result.bse
      z_stats = result.tvalues
      p_values = result.pvalues
      conf_int = result.conf_int(alpha=alpha_val)
      odds_ratios = np.exp(params)

      # Build output table
      output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'odds_ratio']]

      # Add parameter rows
      for i in range(len(params)):
        if fit_intercept and i == 0:
          param_name = 'intercept'
        else:
          predictor_idx = i if not fit_intercept else i - 1
          param_name = f'x{predictor_idx + 1}'

        output.append([
          param_name,
          float(params[i]),
          float(std_errors[i]),
          float(z_stats[i]),
          float(p_values[i]),
          float(conf_int[i, 0]),
          float(conf_int[i, 1]),
          float(odds_ratios[i])
        ])

      # Add model statistics
      # Handle NaN values by converting to empty string
      def safe_float(value):
        try:
          f_val = float(value)
          if math.isnan(f_val):
            return ''
          return f_val
        except (TypeError, ValueError):
          return ''

      output.append(['pseudo_r_squared', safe_float(result.prsquared), '', '', '', '', '', ''])
      output.append(['log_likelihood', safe_float(result.llf), '', '', '', '', '', ''])
      output.append(['aic', safe_float(result.aic), '', '', '', '', '', ''])
      output.append(['bic', safe_float(result.bic), '', '', '', '', '', ''])
      output.append(['llr_pvalue', safe_float(result.llr_pvalue), '', '', '', '', '', ''])

      return output
    except Exception as e:
      return f"Error: {str(e)}"

Online Calculator

y *

Binary dependent variable (0 or 1) as a column vector

x *

Independent variables (predictors) as a matrix where each column is a predictor

fit_intercept

If true, adds an intercept term to the model

alpha

Significance level for confidence intervals (between 0 and 1)