GLM_BINOMIAL

Overview

The GLM_BINOMIAL function fits a Generalized Linear Model (GLM) with a binomial family distribution, designed for modeling binary outcomes (0/1) or proportion data (values between 0 and 1). This type of model is fundamental in fields such as epidemiology, marketing, and social sciences where the response variable represents a probability or binary classification.

GLMs extend ordinary linear regression by allowing the response variable to follow distributions from the exponential family—including binomial, Poisson, and gamma distributions. For binomial data, the model relates the expected probability \mu to the linear predictor \eta = X\beta through a link function g:

g(\mu) = X\beta \quad \text{or equivalently} \quad \mu = g^{-1}(X\beta)

This implementation supports multiple link functions. The default logit link is the most common choice for binomial regression (logistic regression):

\text{logit}(\mu) = \log\left(\frac{\mu}{1-\mu}\right)

Other supported links include probit (based on the standard normal CDF), cloglog (complementary log-log), log, and cauchy. Each link function provides a different transformation between the probability scale and the linear predictor, which can be useful depending on the nature of the data.

Model parameters are estimated via Iteratively Reweighted Least Squares (IRLS), a maximum likelihood method. The function returns coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals. For logit models, odds ratios are also calculated—representing the multiplicative change in odds for a one-unit increase in each predictor. Model fit statistics include deviance, Pearson chi-squared, AIC, BIC, and log-likelihood.

This implementation uses the statsmodels library. For more details, see the GLM documentation and binomial family reference. The statsmodels GitHub repository provides source code and additional examples.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLM_BINOMIAL(y, x, glm_binomial_link, fit_intercept, alpha)

y (list[list], required): Dependent variable as a column vector. For binary data, values should be 0 or 1. For proportion data, values should be between 0 and 1.
x (list[list], required): Independent variables (predictors) as a matrix. Each column is a predictor variable, and each row corresponds to an observation.
glm_binomial_link (str, optional, default: “logit”): Link function to use for the binomial GLM.
fit_intercept (bool, optional, default: true): If True, includes an intercept term in the model.
alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1).

Returns (list[list]): 2D list with GLM results and statistics, or error string.

Example 1: Logit with single predictor

Inputs:

y	x
0	1
0	1.5
0	2
1	2.5
0	3
1	3.5
1	4
1	4.5

Excel formula:

=GLM_BINOMIAL({0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5})

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper	odds_ratio
intercept	-7.05261	4.86734	-1.44897	0.147347	-16.5924	2.48719	0.000865145
x1	2.56459	1.72082	1.49032	0.136139	-0.808168	5.93734	12.9953
deviance	5.0061
pearson_chi2	4.19306
aic	9.0061
bic	9.16498
log_likelihood	-2.50305

Example 2: Logit with proportions and two predictors

Inputs:

y	x
0.1	1	2
0.2	1.5	2.5
0.35	2	3
0.45	2.5	2.5
0.55	3	3.5
0.65	3.5	3
0.75	4	4
0.85	4.5	4.5

Excel formula:

=GLM_BINOMIAL({0.1;0.2;0.35;0.45;0.55;0.65;0.75;0.85}, {1,2;1.5,2.5;2,3;2.5,2.5;3,3.5;3.5,3;4,4;4.5,4.5})

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper	odds_ratio
intercept	-2.95874	4.18707	-0.706638	0.479791	-11.1652	5.24776	0.0518842
x1	0.978058	1.59803	0.612039	0.540512	-2.15403	4.11014	2.65929
x2	0.0655783	2.34379	0.0279796	0.977678	-4.52817	4.65932	1.06778
deviance	0.0286748
pearson_chi2	0.0279973
aic	12.0468
bic	12.2852
log_likelihood	-3.02342

Example 3: Probit link with single predictor

Inputs:

y	x	glm_binomial_link
0	1	probit
0	1.5
0	2
1	2.5
0	3
1	3.5
1	4
1	4.5

Excel formula:

=GLM_BINOMIAL({0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5}, "probit")

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
intercept	-4.28774	2.67005	-1.60587	0.108303	-9.52094	0.945455
x1	1.55918	0.940992	1.65695	0.0975289	-0.28513	3.40349
deviance	4.85138
pearson_chi2	4.07611
aic	8.85138
bic	9.01027
log_likelihood	-2.42569

Example 4: Logit without intercept and custom alpha

Inputs:

y	x	fit_intercept	alpha
0	1	false	0.1
0	1.5
0	2
1	2.5
0	3
1	3.5
1	4
1	4.5

Excel formula:

=GLM_BINOMIAL({0;0;0;1;0;1;1;1}, {1;1.5;2;2.5;3;3.5;4;4.5}, FALSE, 0.1)

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper	odds_ratio
x1	0.206454	0.254336	0.811737	0.416943	-0.211891	0.624799	1.22931
deviance	10.3844
pearson_chi2	7.87637
aic	12.3844
bic	12.4638
log_likelihood	-5.19219

Python Code

Show Code

import math
import statsmodels.api as sm
from statsmodels.genmod.families import Binomial as sm_Binomial
from statsmodels.genmod.generalized_linear_model import SET_USE_BIC_LLF

SET_USE_BIC_LLF(True)

def glm_binomial(y, x, glm_binomial_link='logit', fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Linear Model with binomial family for binary or proportion data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Dependent variable as a column vector. For binary data, values should be 0 or 1. For proportion data, values should be between 0 and 1.
        x (list[list]): Independent variables (predictors) as a matrix. Each column is a predictor variable, and each row corresponds to an observation.
        glm_binomial_link (str, optional): Link function to use for the binomial GLM. Valid options: Logit, Probit, CLogLog, Log, Cauchy. Default is 'logit'.
        fit_intercept (bool, optional): If True, includes an intercept term in the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1). Default is 0.05.

    Returns:
        list[list]: 2D list with GLM results and statistics, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_numeric_2d(data, name):
        if not isinstance(data, list):
            return f"Error: Invalid input: {name} must be a 2D list."
        if len(data) == 0:
            return f"Error: Invalid input: {name} cannot be empty."
        for i, row in enumerate(data):
            if not isinstance(row, list):
                return f"Error: Invalid input: {name} must be a 2D list."
            if len(row) == 0:
                return f"Error: Invalid input: {name} rows cannot be empty."
            for j, val in enumerate(row):
                if not isinstance(val, (int, float)):
                    return f"Error: Invalid input: {name}[{i}][{j}] must be numeric."
                if math.isnan(val) or math.isinf(val):
                    return f"Error: Invalid input: {name}[{i}][{j}] must be finite."
        return None

    try:
        # Normalize inputs
        y = to2d(y)
        x = to2d(x)

        # Validate inputs
        err = validate_numeric_2d(y, 'y')
        if err:
            return err
        err = validate_numeric_2d(x, 'x')
        if err:
            return err

        # Check y is a column vector
        if len(y[0]) != 1:
            return "Error: Invalid input: y must be a column vector (single column)."

        # Check dimensions match
        n_obs_y = len(y)
        n_obs_x = len(x)
        if n_obs_y != n_obs_x:
            return "Error: Invalid input: y and x must have the same number of rows."

        # Validate alpha
        if not isinstance(alpha, (int, float)):
            return "Error: Invalid input: alpha must be numeric."
        if math.isnan(alpha) or math.isinf(alpha):
            return "Error: Invalid input: alpha must be finite."
        if alpha <= 0 or alpha >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        # Validate link function
        valid_links = ['logit', 'probit', 'cloglog', 'log', 'cauchy']
        if not isinstance(glm_binomial_link, str):
            return "Error: Invalid input: glm_binomial_link must be a string."
        if glm_binomial_link not in valid_links:
            return f"Error: Invalid input: glm_binomial_link must be one of {valid_links}."

        # Convert to flat list for y
        y_flat = [row[0] for row in y]

        # Check y values are in valid range [0, 1]
        for i, val in enumerate(y_flat):
            if val < 0 or val > 1:
                return f"Error: Invalid input: y[{i}] = {val} must be between 0 and 1."

        # Get number of columns in x
        n_cols = len(x[0])
        for i, row in enumerate(x):
            if len(row) != n_cols:
                return "Error: Invalid input: all rows in x must have the same number of columns."

        # Convert x to list of columns
        x_data = []
        for col_idx in range(n_cols):
            x_data.append([x[row_idx][col_idx] for row_idx in range(n_obs_x)])

        # Add intercept if requested
        if fit_intercept:
            x_data.insert(0, [1.0] * n_obs_x)

        # Transpose to get design matrix
        design_matrix = []
        for row_idx in range(n_obs_x):
            design_matrix.append([col[row_idx] for col in x_data])

        # Create link object
        try:
            if glm_binomial_link == 'logit':
                link = sm.families.links.Logit()
            elif glm_binomial_link == 'probit':
                link = sm.families.links.Probit()
            elif glm_binomial_link == 'cloglog':
                link = sm.families.links.CLogLog()
            elif glm_binomial_link == 'log':
                link = sm.families.links.Log()
            elif glm_binomial_link == 'cauchy':
                link = sm.families.links.Cauchy()
        except Exception as e:
            return f"Error: Invalid input: unable to create link function: {e}"

        # Fit GLM
        try:
            family = sm_Binomial(link=link)
            model = sm.GLM(y_flat, design_matrix, family=family)
            result = model.fit()
        except Exception as e:
            return f"Error: statsmodels.GLM error: {e}"

        # Extract results
        try:
            params = result.params
            bse = result.bse
            tvalues = result.tvalues
            pvalues = result.pvalues
            conf_int = result.conf_int(alpha=alpha)

            # Build parameter names
            param_names = []
            if fit_intercept:
                param_names.append('intercept')
            for i in range(n_cols):
                param_names.append(f'x{i+1}')

            # Build results table
            results = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'odds_ratio']]

            for i, name in enumerate(param_names):
                coef = float(params[i])
                stderr = float(bse[i])
                zstat = float(tvalues[i])
                pval = float(pvalues[i])
                ci_low = float(conf_int[i][0])
                ci_high = float(conf_int[i][1])
                odds_ratio = math.exp(coef) if glm_binomial_link == 'logit' else None

                results.append([
                    name,
                    coef,
                    stderr,
                    zstat,
                    pval,
                    ci_low,
                    ci_high,
                    odds_ratio if odds_ratio is not None else ''
                ])

            # Add model statistics
            results.append(['deviance', float(result.deviance), '', '', '', '', '', ''])
            results.append(['pearson_chi2', float(result.pearson_chi2), '', '', '', '', '', ''])
            results.append(['aic', float(result.aic), '', '', '', '', '', ''])
            results.append(['bic', float(result.bic_llf), '', '', '', '', '', ''])
            results.append(['log_likelihood', float(result.llf), '', '', '', '', '', ''])

            return results

        except Exception as e:
            return f"Error: statsmodels.GLM error: unable to extract results: {e}"
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

y *

Dependent variable as a column vector. For binary data, values should be 0 or 1. For proportion data, values should be between 0 and 1.

x *

Independent variables (predictors) as a matrix. Each column is a predictor variable, and each row corresponds to an observation.

glm_binomial_link

Link function to use for the binomial GLM.

fit_intercept

If True, includes an intercept term in the model.

alpha

Significance level for confidence intervals (between 0 and 1).