LOGIT_MODEL
Overview
The LOGIT_MODEL function fits a binary logistic regression model to predict binary outcomes (0 or 1) using maximum likelihood estimation (MLE). Logistic regression is one of the most widely used statistical methods for binary classification, commonly applied in credit scoring, medical diagnosis, marketing response prediction, and many other domains where the outcome variable is dichotomous.
This implementation uses the statsmodels library, specifically the Logit class from the discrete choice models module. For more background on discrete regression models, see the statsmodels documentation on regression with discrete dependent variables.
The logistic regression model relates the probability of the binary outcome to predictor variables through the logistic (sigmoid) function:
P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k)}}
The model parameters \beta are estimated by maximizing the log-likelihood function. The function returns coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals for each predictor. It also computes odds ratios (e^\beta), which represent the multiplicative change in odds for a one-unit increase in the corresponding predictor.
Model fit is assessed using several statistics: the pseudo R-squared (McFadden’s R²), which compares the fitted model to a null model; the log-likelihood value; AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) for model comparison; and the likelihood ratio test p-value for overall model significance.
Key references for logistic regression and discrete choice models include Cameron and Trivedi’s Regression Analysis of Count Data (1998), Maddala’s Limited-Dependent and Qualitative Variables in Econometrics (1983), and Greene’s Econometric Analysis (2003). The source code is available on the statsmodels GitHub repository.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=LOGIT_MODEL(y, x, fit_intercept, alpha)
y(list[list], required): Binary dependent variable (0 or 1) as a column vectorx(list[list], required): Independent variables (predictors) as a matrix where each column is a predictorfit_intercept(bool, optional, default: true): If true, adds an intercept term to the modelalpha(float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)
Returns (list[list]): 2D list with logit results and statistics, or error string.
Examples
Example 1: Demo case 1
Inputs:
| y | x |
|---|---|
| 0 | 1 |
| 0 | 1.5 |
| 0 | 2 |
| 0 | 2.5 |
| 0 | 3 |
| 1 | 3.5 |
| 0 | 4 |
| 1 | 4.5 |
| 1 | 5 |
| 1 | 5.5 |
Excel formula:
=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | odds_ratio |
|---|---|---|---|---|---|---|---|
| intercept | -9.721 | 6.424 | -1.513 | 0.1302 | -22.31 | 2.87 | 0.00006003 |
| x1 | 2.591 | 1.69 | 1.533 | 0.1253 | -0.7218 | 5.904 | 13.34 |
| pseudo_r_squared | 0.6275 | ||||||
| log_likelihood | -2.507 | ||||||
| aic | 9.014 | ||||||
| bic | 9.619 | ||||||
| llr_pvalue | 0.003658 |
Example 2: Demo case 2
Inputs:
| y | x | |
|---|---|---|
| 0 | 1 | 5 |
| 0 | 1.5 | 4.5 |
| 0 | 2 | 6 |
| 0 | 1.8 | 5.5 |
| 0 | 2.2 | 4 |
| 0 | 1.2 | 6.5 |
| 1 | 3 | 7 |
| 0 | 2.5 | 5 |
| 1 | 3.5 | 8 |
| 0 | 2.8 | 6 |
| 1 | 4 | 9 |
| 1 | 4.2 | 8.5 |
| 1 | 3.8 | 9.5 |
| 1 | 4.5 | 10 |
| 0 | 3.2 | 7.5 |
| 1 | 5 | 11 |
| 1 | 5.2 | 10.5 |
| 1 | 4.8 | 11.5 |
| 1 | 5.5 | 12 |
| 1 | 5.8 | 12.5 |
Excel formula:
=LOGIT_MODEL({0;0;0;0;0;0;1;0;1;0;1;1;1;1;0;1;1;1;1;1}, {1,5;1.5,4.5;2,6;1.8,5.5;2.2,4;1.2,6.5;3,7;2.5,5;3.5,8;2.8,6;4,9;4.2,8.5;3.8,9.5;4.5,10;3.2,7.5;5,11;5.2,10.5;4.8,11.5;5.5,12;5.8,12.5})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | odds_ratio |
|---|---|---|---|---|---|---|---|
| intercept | -17.23 | 11.24 | -1.532 | 0.1254 | -39.27 | 4.808 | 3.287e-8 |
| x1 | 3.174 | 7.915 | 0.401 | 0.6884 | -12.34 | 18.69 | 23.9 |
| x2 | 1.023 | 3.361 | 0.3044 | 0.7608 | -5.564 | 7.61 | 2.782 |
| pseudo_r_squared | 0.8311 | ||||||
| log_likelihood | -2.324 | ||||||
| aic | 10.65 | ||||||
| bic | 13.64 | ||||||
| llr_pvalue | 0.00001077 |
Example 3: Demo case 3
Inputs:
| y | x | fit_intercept |
|---|---|---|
| 0 | 1 | false |
| 0 | 1.5 | |
| 0 | 2 | |
| 0 | 2.5 | |
| 0 | 3 | |
| 1 | 3.5 | |
| 0 | 4 | |
| 1 | 4.5 | |
| 1 | 5 | |
| 1 | 5.5 |
Excel formula:
=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5}, FALSE)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | odds_ratio |
|---|---|---|---|---|---|---|---|
| x1 | 0.07189 | 0.1803 | 0.3988 | 0.69 | -0.2814 | 0.4252 | 1.075 |
| pseudo_r_squared | -0.01795 | ||||||
| log_likelihood | -6.851 | ||||||
| aic | 15.7 | ||||||
| bic | 16 | ||||||
| llr_pvalue |
Example 4: Demo case 4
Inputs:
| y | x | alpha |
|---|---|---|
| 0 | 1 | 0.1 |
| 0 | 1.5 | |
| 0 | 2 | |
| 0 | 2.5 | |
| 0 | 3 | |
| 1 | 3.5 | |
| 0 | 4 | |
| 1 | 4.5 | |
| 1 | 5 | |
| 1 | 5.5 |
Excel formula:
=LOGIT_MODEL({0;0;0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5;5;5.5}, 0.1)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | odds_ratio |
|---|---|---|---|---|---|---|---|
| intercept | -9.721 | 6.424 | -1.513 | 0.1302 | -20.29 | 0.8461 | 0.00006003 |
| x1 | 2.591 | 1.69 | 1.533 | 0.1253 | -0.1892 | 5.371 | 13.34 |
| pseudo_r_squared | 0.6275 | ||||||
| log_likelihood | -2.507 | ||||||
| aic | 9.014 | ||||||
| bic | 9.619 | ||||||
| llr_pvalue | 0.003658 |
Python Code
import math
import numpy as np
from statsmodels.discrete.discrete_model import Logit as statsmodels_logit
def logit_model(y, x, fit_intercept=True, alpha=0.05):
"""
Fits a binary logistic regression model to predict binary outcomes using maximum likelihood estimation.
See: https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.Logit.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Binary dependent variable (0 or 1) as a column vector
x (list[list]): Independent variables (predictors) as a matrix where each column is a predictor
fit_intercept (bool, optional): If true, adds an intercept term to the model Default is True.
alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.
Returns:
list[list]: 2D list with logit results and statistics, or error string.
"""
def to2d(val):
return [[val]] if not isinstance(val, list) else val
# Normalize inputs to 2D lists
y_2d = to2d(y)
x_2d = to2d(x)
# Validate y (must be a column vector)
if not isinstance(y_2d, list) or not all(isinstance(row, list) for row in y_2d):
return "Error: Invalid input: y must be a 2D list (column vector)."
if len(y_2d) == 0:
return "Error: Invalid input: y must not be empty."
# Extract y values (column vector)
y_values = []
for row in y_2d:
if len(row) != 1:
return "Error: Invalid input: y must be a column vector with one element per row."
try:
val = float(row[0])
if math.isnan(val) or math.isinf(val):
return "Error: Invalid input: y values must be finite."
if val not in [0.0, 1.0]:
return "Error: Invalid input: y values must be 0 or 1 for binary logistic regression."
y_values.append(val)
except (TypeError, ValueError):
return "Error: Invalid input: y values must be numeric."
n_obs = len(y_values)
# Validate x (must be a matrix)
if not isinstance(x_2d, list) or not all(isinstance(row, list) for row in x_2d):
return "Error: Invalid input: x must be a 2D list (matrix)."
if len(x_2d) != n_obs:
return f"Error: Invalid input: x must have {n_obs} rows to match y."
# Extract x values (matrix)
x_values = []
n_predictors = None
for i, row in enumerate(x_2d):
if n_predictors is None:
n_predictors = len(row)
elif len(row) != n_predictors:
return "Error: Invalid input: all rows of x must have the same number of columns."
row_vals = []
for val in row:
try:
num_val = float(val)
if math.isnan(num_val) or math.isinf(num_val):
return "Error: Invalid input: x values must be finite."
row_vals.append(num_val)
except (TypeError, ValueError):
return "Error: Invalid input: x values must be numeric."
x_values.append(row_vals)
# Validate n_obs vs n_params
num_params = n_predictors + (1 if fit_intercept else 0)
if n_obs < num_params:
return "Error: Number of observations must be greater than or equal to number of parameters."
# Validate alpha
try:
alpha_val = float(alpha)
if math.isnan(alpha_val) or math.isinf(alpha_val):
return "Error: Invalid input: alpha must be finite."
if alpha_val <= 0 or alpha_val >= 1:
return "Error: Invalid input: alpha must be between 0 and 1."
except (TypeError, ValueError):
return "Error: Invalid input: alpha must be numeric."
# Convert to numpy arrays
y_array = np.array(y_values)
x_array = np.array(x_values)
# Add intercept if requested
if fit_intercept:
x_array = np.column_stack([np.ones(n_obs), x_array])
# Fit the logistic regression model
try:
model = statsmodels_logit(y_array, x_array)
result = model.fit(disp=0)
except Exception as e:
return f"Error: statsmodels.discrete.discrete_model.Logit error: {e}"
# Extract results
params = result.params
std_errors = result.bse
z_stats = result.tvalues
p_values = result.pvalues
conf_int = result.conf_int(alpha=alpha_val)
odds_ratios = np.exp(params)
# Build output table
output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'odds_ratio']]
# Add parameter rows
for i in range(len(params)):
if fit_intercept and i == 0:
param_name = 'intercept'
else:
predictor_idx = i if not fit_intercept else i - 1
param_name = f'x{predictor_idx + 1}'
output.append([
param_name,
float(params[i]),
float(std_errors[i]),
float(z_stats[i]),
float(p_values[i]),
float(conf_int[i, 0]),
float(conf_int[i, 1]),
float(odds_ratios[i])
])
# Add model statistics
# Handle NaN values by converting to empty string
def safe_float(value):
try:
f_val = float(value)
if math.isnan(f_val):
return ''
return f_val
except (TypeError, ValueError):
return ''
output.append(['pseudo_r_squared', safe_float(result.prsquared), '', '', '', '', '', ''])
output.append(['log_likelihood', safe_float(result.llf), '', '', '', '', '', ''])
output.append(['aic', safe_float(result.aic), '', '', '', '', '', ''])
output.append(['bic', safe_float(result.bic), '', '', '', '', '', ''])
output.append(['llr_pvalue', safe_float(result.llr_pvalue), '', '', '', '', '', ''])
return output