GLM_POISSON
Overview
The GLM_POISSON function fits a Generalized Linear Model (GLM) with a Poisson distribution family, designed specifically for modeling count data. Poisson regression is appropriate when the response variable represents counts of events—such as the number of customer arrivals, defect occurrences, or species observations—where values are non-negative integers.
This implementation uses the statsmodels library’s GLM framework. For complete documentation, see the statsmodels GLM reference and the GLM class API. The source code is available on GitHub.
The Poisson distribution assumes that the mean and variance of the response variable are equal. The model relates the expected count \mu to predictor variables through a link function. The canonical link for Poisson regression is the log link:
\log(\mu_i) = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_p x_{ip}
This function supports three link functions: log (default), identity, and sqrt. The log link ensures predicted counts are always positive and provides easily interpretable coefficients—each unit increase in a predictor multiplies the expected count by e^{\beta}, known as the incidence rate ratio (IRR).
Parameters are estimated via maximum likelihood using iteratively reweighted least squares (IRLS). The function returns coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals. Model fit is assessed using deviance, Pearson chi-squared, AIC, BIC, and log-likelihood statistics. The incidence rate ratio for each coefficient is also computed as \text{IRR} = e^{\beta}.
A key assumption of Poisson regression is equidispersion—that the variance equals the mean. When observed variance exceeds the mean (overdispersion), consider using negative binomial regression or quasi-Poisson methods instead. The Pearson chi-squared statistic divided by residual degrees of freedom provides a rough overdispersion diagnostic; values substantially greater than 1 suggest overdispersion.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=GLM_POISSON(y, x, glm_poisson_link, fit_intercept, alpha)
y(list[list], required): Column vector of count data (non-negative values). Each element represents an observation of the dependent variable.x(list[list], required): Matrix of predictor variables. Each column represents a different predictor variable; each row represents an observation.glm_poisson_link(str, optional, default: “log”): The link function to use for the Poisson GLM model.fit_intercept(bool, optional, default: true): If True, adds an intercept term to the model.alpha(float, optional, default: 0.05): Significance level for confidence intervals. Must be between 0 and 1.
Returns (list[list]): 2D list with GLM results and statistics, or error string.
Examples
Example 1: Demo case 1
Inputs:
| y | x |
|---|---|
| 2 | 1 |
| 3 | 2 |
| 5 | 3 |
| 7 | 4 |
| 11 | 5 |
Excel formula:
=GLM_POISSON({2;3;5;7;11}, {1;2;3;4;5})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | incidence_rate_ratio |
|---|---|---|---|---|---|---|---|
| intercept | 0.2836 | 0.5951 | 0.4766 | 0.6337 | -0.8828 | 1.45 | 1.328 |
| x1 | 0.4224 | 0.1491 | 2.833 | 0.004604 | 0.1302 | 0.7146 | 1.526 |
| deviance | 0.0252 | ||||||
| pearson_chi2 | 0.02546 | ||||||
| aic | 21.17 | ||||||
| bic | 20.39 | ||||||
| log_likelihood | -8.585 |
Example 2: Demo case 2
Inputs:
| y | x | glm_poisson_link |
|---|---|---|
| 5 | 1 | identity |
| 8 | 2 | |
| 10 | 3 | |
| 15 | 4 |
Excel formula:
=GLM_POISSON({5;8;10;15}, {1;2;3;4}, "identity")
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | incidence_rate_ratio |
|---|---|---|---|---|---|---|---|
| intercept | 1.775 | 3.055 | 0.581 | 0.5612 | -4.212 | 7.762 | 5.9 |
| x1 | 3.09 | 1.316 | 2.349 | 0.01885 | 0.5113 | 5.669 | 21.98 |
| deviance | 0.158 | ||||||
| pearson_chi2 | 0.1558 | ||||||
| aic | 20.29 | ||||||
| bic | 19.06 | ||||||
| log_likelihood | -8.145 |
Example 3: Demo case 3
Inputs:
| y | x | glm_poisson_link |
|---|---|---|
| 3 | 1 | sqrt |
| 6 | 2 | |
| 9 | 3 | |
| 12 | 4 |
Excel formula:
=GLM_POISSON({3;6;9;12}, {1;2;3;4}, "sqrt")
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | incidence_rate_ratio |
|---|---|---|---|---|---|---|---|
| intercept | 1.226 | 0.6124 | 2.003 | 0.04521 | 0.02616 | 2.427 | 3.409 |
| x1 | 0.5744 | 0.2236 | 2.569 | 0.01021 | 0.1361 | 1.013 | 1.776 |
| deviance | 0.06545 | ||||||
| pearson_chi2 | 0.06541 | ||||||
| aic | 19.1 | ||||||
| bic | 17.88 | ||||||
| log_likelihood | -7.552 |
Example 4: Demo case 4
Inputs:
| y | x | glm_poisson_link | fit_intercept | alpha |
|---|---|---|---|---|
| 4 | 2 | log | false | 0.1 |
| 6 | 3 | |||
| 8 | 4 | |||
| 10 | 5 |
Excel formula:
=GLM_POISSON({4;6;8;10}, {2;3;4;5}, "log", FALSE, 0.1)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | incidence_rate_ratio |
|---|---|---|---|---|---|---|---|
| x1 | 0.4971 | 0.04623 | 10.75 | 5.703e-27 | 0.4211 | 0.5732 | 1.644 |
| deviance | 1.453 | ||||||
| pearson_chi2 | 1.57 | ||||||
| aic | 18.47 | ||||||
| bic | 17.86 | ||||||
| log_likelihood | -8.236 |
Python Code
import math
import statsmodels.api as sm
from statsmodels.genmod.families import Poisson as sm_Poisson
from statsmodels.genmod.families.links import Log as sm_Log, Identity as sm_Identity, Sqrt as sm_Sqrt
def glm_poisson(y, x, glm_poisson_link='log', fit_intercept=True, alpha=0.05):
"""
Fits a Generalized Linear Model with Poisson family for count data.
See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Column vector of count data (non-negative values). Each element represents an observation of the dependent variable.
x (list[list]): Matrix of predictor variables. Each column represents a different predictor variable; each row represents an observation.
glm_poisson_link (str, optional): The link function to use for the Poisson GLM model. Valid options: log, identity, sqrt. Default is 'log'.
fit_intercept (bool, optional): If True, adds an intercept term to the model. Default is True.
alpha (float, optional): Significance level for confidence intervals. Must be between 0 and 1. Default is 0.05.
Returns:
list[list]: 2D list with GLM results and statistics, or error string.
"""
def to2d(value):
"""Convert scalar or list to 2D list."""
return [[value]] if not isinstance(value, list) else value
# Normalize inputs
y = to2d(y)
x = to2d(x)
# Validate y is a column vector
if not isinstance(y, list) or len(y) == 0:
return "Invalid input: y must be a non-empty 2D list."
for row in y:
if not isinstance(row, list) or len(row) != 1:
return "Invalid input: y must be a column vector (2D list with single column)."
# Validate x is a 2D matrix
if not isinstance(x, list) or len(x) == 0:
return "Invalid input: x must be a non-empty 2D list."
num_rows = len(x)
num_cols = len(x[0]) if isinstance(x[0], list) else 0
if num_cols == 0:
return "Invalid input: x must be a 2D list with at least one column."
for row in x:
if not isinstance(row, list) or len(row) != num_cols:
return "Invalid input: x must have consistent number of columns."
# Check dimensions match
if len(y) != num_rows:
return "Invalid input: y and x must have the same number of rows."
# Validate link function
valid_links = ['log', 'identity', 'sqrt']
if not isinstance(glm_poisson_link, str):
return "Invalid input: glm_poisson_link must be a string."
if glm_poisson_link not in valid_links:
return f"Invalid input: glm_poisson_link must be one of {valid_links}."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
# Validate alpha
if not isinstance(alpha, (int, float)):
return "Invalid input: alpha must be a number."
if math.isnan(alpha) or math.isinf(alpha):
return "Invalid input: alpha must be finite."
if alpha <= 0 or alpha >= 1:
return "Invalid input: alpha must be between 0 and 1."
# Convert y and x to numeric values
try:
y_values = [float(row[0]) for row in y]
except (ValueError, TypeError):
return "Invalid input: y must contain numeric values."
try:
x_values = [[float(val) for val in row] for row in x]
except (ValueError, TypeError):
return "Invalid input: x must contain numeric values."
# Check for non-finite values
for val in y_values:
if math.isnan(val) or math.isinf(val):
return "Invalid input: y must contain finite values."
for row in x_values:
for val in row:
if math.isnan(val) or math.isinf(val):
return "Invalid input: x must contain finite values."
# Check that y values are non-negative (count data)
for val in y_values:
if val < 0:
return "Invalid input: y must contain non-negative values (count data)."
# Add intercept if requested
if fit_intercept:
x_values = [[1.0] + row for row in x_values]
# Select link function
if glm_poisson_link == 'log':
link = sm_Log()
elif glm_poisson_link == 'identity':
link = sm_Identity()
elif glm_poisson_link == 'sqrt':
link = sm_Sqrt()
# Create Poisson family
family = sm_Poisson(link=link)
# Fit GLM
try:
model = sm.GLM(y_values, x_values, family=family)
results = model.fit()
except Exception as e:
return f"statsmodels.genmod.generalized_linear_model.GLM error: {e}"
# Extract results
params = results.params
std_errors = results.bse
z_stats = results.tvalues
p_values = results.pvalues
conf_int = results.conf_int(alpha=alpha)
# Build results table
output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'incidence_rate_ratio']]
# Add parameter results
for i in range(len(params)):
if fit_intercept and i == 0:
param_name = 'intercept'
else:
predictor_idx = i if not fit_intercept else i - 1
param_name = f'x{predictor_idx + 1}'
coef = float(params[i])
std_err = float(std_errors[i])
z_stat = float(z_stats[i])
p_val = float(p_values[i])
ci_low = float(conf_int[i, 0])
ci_high = float(conf_int[i, 1])
irr = math.exp(coef)
output.append([param_name, coef, std_err, z_stat, p_val, ci_low, ci_high, irr])
# Add model statistics
output.append(['deviance', float(results.deviance), '', '', '', '', '', ''])
output.append(['pearson_chi2', float(results.pearson_chi2), '', '', '', '', '', ''])
output.append(['aic', float(results.aic), '', '', '', '', '', ''])
output.append(['bic', float(results.bic_llf), '', '', '', '', '', ''])
output.append(['log_likelihood', float(results.llf), '', '', '', '', '', ''])
return output