GLM_GAMMA
Overview
The GLM_GAMMA function fits a Generalized Linear Model (GLM) with the Gamma distribution family, designed for modeling positive continuous response variables. This model is particularly well-suited for data where the variance increases with the mean, such as insurance claims, survival times, income, and other right-skewed positive quantities.
Generalized Linear Models extend ordinary linear regression by allowing the response variable to follow a distribution from the exponential family and relating the mean response to the linear predictor through a link function. The Gamma GLM assumes responses follow a Gamma distribution, where the variance is proportional to the square of the mean:
\text{Var}(Y) = \phi \mu^2
where \mu is the mean and \phi is the dispersion (scale) parameter.
This implementation uses the statsmodels library’s GLM class with the Gamma family. The function supports three link functions:
- Inverse (canonical link): g(\mu) = 1/\mu, the default and theoretically optimal for Gamma
- Log: g(\mu) = \log(\mu), ensures positive fitted values and is commonly used
- Identity: g(\mu) = \mu, provides direct interpretation but may produce negative predictions
The model is fitted using Iteratively Reweighted Least Squares (IRLS), an efficient algorithm for maximum likelihood estimation in GLMs. The output includes coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals for each predictor, along with model fit statistics including deviance, Pearson chi-squared, AIC, BIC, and log-likelihood.
For additional background, see McCullagh & Nelder’s foundational text Generalized Linear Models (1989) and the statsmodels GLM documentation. The source code is available on the statsmodels GitHub repository.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=GLM_GAMMA(y, x, glm_gamma_link, fit_intercept, alpha)
y(list[list], required): Dependent variable as a column vector. Must contain positive values.x(list[list], required): Independent variables (predictors). Each column represents a predictor.glm_gamma_link(str, optional, default: “inverse”): Link function to use for the Gamma GLM.fit_intercept(bool, optional, default: true): If True, includes an intercept term in the model.alpha(float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1).
Returns (list[list]): 2D list with GLM results and statistics, or error string.
Examples
Example 1: Demo case 1
Inputs:
| y | x |
|---|---|
| 2.5 | 1 |
| 3.2 | 2 |
| 4.1 | 3 |
| 5.3 | 4 |
| 6.7 | 5 |
Excel formula:
=GLM_GAMMA({2.5;3.2;4.1;5.3;6.7}, {1;2;3;4;5})
Expected output:
"non-error"
Example 2: Demo case 2
Inputs:
| y | x | glm_gamma_link | |
|---|---|---|---|
| 1.2 | 1 | 2 | log |
| 2.3 | 1.5 | 2.5 | |
| 3.4 | 2 | 3 | |
| 4.5 | 2.5 | 3.5 | |
| 5.6 | 3 | 4 | |
| 6.7 | 3.5 | 4.5 |
Excel formula:
=GLM_GAMMA({1.2;2.3;3.4;4.5;5.6;6.7}, {1,2;1.5,2.5;2,3;2.5,3.5;3,4;3.5,4.5}, "log")
Expected output:
"non-error"
Example 3: Demo case 3
Inputs:
| y | x | glm_gamma_link | fit_intercept |
|---|---|---|---|
| 1.5 | 1 | identity | false |
| 3 | 2 | ||
| 4.5 | 3 | ||
| 6 | 4 |
Excel formula:
=GLM_GAMMA({1.5;3;4.5;6}, {1;2;3;4}, "identity", FALSE)
Expected output:
"non-error"
Example 4: Demo case 4
Inputs:
| y | x | alpha | |
|---|---|---|---|
| 2.1 | 1 | 0.5 | 0.1 |
| 3.2 | 1.5 | 1 | |
| 4.3 | 2 | 1.5 | |
| 5.4 | 2.5 | 2 | |
| 6.5 | 3 | 2.5 | |
| 7.6 | 3.5 | 3 | |
| 8.7 | 4 | 3.5 |
Excel formula:
=GLM_GAMMA({2.1;3.2;4.3;5.4;6.5;7.6;8.7}, {1,0.5;1.5,1;2,1.5;2.5,2;3,2.5;3.5,3;4,3.5}, 0.1)
Expected output:
"non-error"
Python Code
import math
import warnings
from statsmodels.genmod.generalized_linear_model import GLM as statsmodels_GLM, SET_USE_BIC_LLF
from statsmodels.genmod.families import Gamma as statsmodels_Gamma
from statsmodels.genmod.families.links import InversePower, Log, Identity
def glm_gamma(y, x, glm_gamma_link='inverse', fit_intercept=True, alpha=0.05):
"""
Fit a Generalized Linear Model with Gamma family for positive continuous data.
See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Dependent variable as a column vector. Must contain positive values.
x (list[list]): Independent variables (predictors). Each column represents a predictor.
glm_gamma_link (str, optional): Link function to use for the Gamma GLM. Valid options: Inverse, Log, Identity. Default is 'inverse'.
fit_intercept (bool, optional): If True, includes an intercept term in the model. Default is True.
alpha (float, optional): Significance level for confidence intervals (between 0 and 1). Default is 0.05.
Returns:
list[list]: 2D list with GLM results and statistics, or error string.
"""
def to2d(arr):
return [[arr]] if not isinstance(arr, list) else arr
def validate_numeric(val, name):
try:
num = float(val)
except Exception:
return f"Invalid input: {name} must be numeric."
if math.isnan(num) or math.isinf(num):
return f"Invalid input: {name} must be finite."
return num
# Normalize inputs
y_2d = to2d(y)
x_2d = to2d(x)
# Validate y is a column vector
if not isinstance(y_2d, list) or len(y_2d) == 0:
return "Invalid input: y must be a non-empty 2D list."
# Extract y values
y_values = []
for row in y_2d:
if not isinstance(row, list) or len(row) != 1:
return "Invalid input: y must be a column vector (single column)."
val = validate_numeric(row[0], "y")
if isinstance(val, str):
return val
if val <= 0:
return "Invalid input: y values must be positive for Gamma family."
y_values.append(val)
# Validate x is a matrix
if not isinstance(x_2d, list) or len(x_2d) == 0:
return "Invalid input: x must be a non-empty 2D list."
n_rows = len(x_2d)
if n_rows != len(y_values):
return "Invalid input: y and x must have the same number of rows."
# Extract x values
n_cols = None
x_values = []
for i, row in enumerate(x_2d):
if not isinstance(row, list):
return "Invalid input: x must be a 2D list."
if n_cols is None:
n_cols = len(row)
elif len(row) != n_cols:
return "Invalid input: all rows in x must have the same number of columns."
row_vals = []
for val in row:
num = validate_numeric(val, "x")
if isinstance(num, str):
return num
row_vals.append(num)
x_values.append(row_vals)
# Validate link function
valid_links = ['inverse', 'log', 'identity']
if not isinstance(glm_gamma_link, str) or glm_gamma_link not in valid_links:
return f"Invalid input: glm_gamma_link must be one of {valid_links}."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
# Validate alpha
alpha_num = validate_numeric(alpha, "alpha")
if isinstance(alpha_num, str):
return alpha_num
if alpha_num <= 0 or alpha_num >= 1:
return "Invalid input: alpha must be between 0 and 1."
# Add intercept if needed
if fit_intercept:
x_with_intercept = [[1.0] + row for row in x_values]
else:
x_with_intercept = x_values
# Create link function
try:
if glm_gamma_link == 'inverse':
link = InversePower()
elif glm_gamma_link == 'log':
link = Log()
else: # identity
link = Identity()
family = statsmodels_Gamma(link=link)
except Exception as exc:
return f"Error creating Gamma family: {exc}"
# Fit the model
try:
SET_USE_BIC_LLF(True)
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', message='.*Perfect separation.*')
warnings.filterwarnings('ignore', message='.*does not respect the domain.*')
model = statsmodels_GLM(y_values, x_with_intercept, family=family)
results = model.fit()
except Exception as exc:
return f"Error fitting GLM: {exc}"
# Extract results
try:
params = results.params
std_errors = results.bse
z_stats = results.tvalues
p_values = results.pvalues
conf_int = results.conf_int(alpha=alpha_num)
# Build output table
output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper']]
# Add parameter rows
for i in range(len(params)):
if fit_intercept and i == 0:
param_name = 'intercept'
else:
predictor_idx = i if not fit_intercept else i - 1
param_name = f'x{predictor_idx + 1}'
output.append([
param_name,
float(params[i]),
float(std_errors[i]),
float(z_stats[i]),
float(p_values[i]),
float(conf_int[i, 0]),
float(conf_int[i, 1])
])
# Add model statistics
output.append(['deviance', float(results.deviance), '', '', '', '', ''])
output.append(['pearson_chi2', float(results.pearson_chi2), '', '', '', '', ''])
output.append(['aic', float(results.aic), '', '', '', '', ''])
output.append(['bic', float(results.bic_llf), '', '', '', '', ''])
output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
output.append(['scale', float(results.scale), '', '', '', '', ''])
return output
except Exception as exc:
return f"Error extracting results: {exc}"