GLM_TWEEDIE
Overview
The GLM_TWEEDIE function fits a Generalized Linear Model (GLM) using the Tweedie family of distributions, providing a flexible framework for modeling data with non-normal response distributions. The Tweedie distribution is a special case of exponential dispersion models that unifies several common distributions including Poisson, Gamma, and compound Poisson-Gamma (useful for data with exact zeros and positive continuous values).
This implementation uses the statsmodels library’s GLM module. For detailed documentation, see the statsmodels GLM documentation and the Tweedie family reference.
The key parameter is the variance power p, which controls the relationship between the variance and mean of the response variable:
\text{Var}(Y) = \phi \cdot \mu^p
where \mu is the mean, \phi is the dispersion parameter, and p is the variance power. Different values of p correspond to different distributions:
- p = 1: Poisson (count data)
- 1 < p < 2: Compound Poisson-Gamma (insurance claims, rainfall with zeros)
- p = 2: Gamma (positive continuous data)
The function supports Log and Power link functions. The log link (default) is commonly used and ensures positive predictions:
\log(\mu) = X\beta \quad \Rightarrow \quad \mu = e^{X\beta}
Model output includes coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals for each predictor. Goodness-of-fit statistics such as deviance, Pearson chi-squared, AIC, BIC, and log-likelihood are also provided. Tweedie GLMs are widely applied in actuarial science for insurance claim modeling, ecology for species abundance data, and any domain where data exhibits a mean-variance power relationship.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=GLM_TWEEDIE(y, x, var_power, glm_tweedie_link, fit_intercept, alpha)
y(list[list], required): Dependent variable as a column vector (2D list with one column).x(list[list], required): Independent variables (predictors) as a matrix. Each column represents a predictor variable.var_power(float, optional, default: 1.5): Variance power parameter controlling the Tweedie distribution (1 <= var_power <= 2). 1.0 = Poisson, 1.5 = Compound Poisson-Gamma, 2.0 = Gamma.glm_tweedie_link(str, optional, default: “log”): Link function to use for the Tweedie GLM model.fit_intercept(bool, optional, default: true): If True, adds an intercept term to the model.alpha(float, optional, default: 0.05): Significance level for confidence intervals (0 < alpha < 1).
Returns (list[list]): 2D list with GLM results and statistics, or error string.
Examples
Example 1: Demo case 1
Inputs:
| y | x |
|---|---|
| 1.2 | 1 |
| 2.3 | 2 |
| 3.8 | 3 |
| 4.5 | 4 |
| 6.1 | 5 |
| 7.2 | 6 |
Excel formula:
=GLM_TWEEDIE({1.2;2.3;3.8;4.5;6.1;7.2}, {1;2;3;4;5;6})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0.16737052107241146 | 0.1777598466210029 | 0.9415541487794922 | 0.3464209539226616 | -0.1810323762021182 | 0.5157734183469411 |
| x1 | 0.3205527363026178 | 0.04136592541451977 | 7.74919775371691 | 9.247490982061944e-15 | 0.23947701230298896 | 0.4016284603022467 |
| deviance | 0.23257537880695267 | |||||
| pearson_chi2 | 0.22600578760047332 | |||||
| aic | 13.393097469193265 | |||||
| bic | 13.39 | |||||
| log_likelihood | -4.696548734596632 | |||||
| scale | 0.05650144690011844 |
Example 2: Demo case 2
Inputs:
| y | x | var_power |
|---|---|---|
| 1 | 1 | 1 |
| 3 | 2 | |
| 2 | 2.5 | |
| 5 | 4 | |
| 4 | 3.5 | |
| 8 | 5 |
Excel formula:
=GLM_TWEEDIE({1;3;2;5;4;8}, {1;2;2.5;4;3.5;5}, 1)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | -0.17824085881745044 | 0.27172165173113344 | -0.6559685534144274 | 0.5118443577545689 | -0.7108055100302076 | 0.35432379239530676 |
| x1 | 0.45011040836640037 | 0.06918247257596571 | 6.5061335856731235 | 7.710966198964227e-11 | 0.3145152537560776 | 0.5857055629767232 |
| deviance | 0.6001504010077837 | |||||
| pearson_chi2 | 0.6362839140174792 | |||||
| aic | 122.75327413912515 | |||||
| bic | 122.33679307758129 | |||||
| log_likelihood | -59.37663706956258 | |||||
| scale | 0.15907097850436994 |
Example 3: Demo case 3
Inputs:
| y | x | var_power | glm_tweedie_link |
|---|---|---|---|
| 2.1 | 1 | 2 | power |
| 3.9 | 2 | ||
| 6.2 | 3 | ||
| 7.8 | 4 | ||
| 10.5 | 5 |
Excel formula:
=GLM_TWEEDIE({2.1;3.9;6.2;7.8;10.5}, {1;2;3;4;5}, 2, "power")
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0.06614066513969515 | 0.13362379156684015 | 0.4949767130848912 | 0.6206165584728542 | -0.19575715380899855 | 0.3280384840883888 |
| x1 | 2.0031265943221577 | 0.07147363948224718 | 28.026089182427878 | 7.81611292159438e-173 | 1.8630408350929533 | 2.143212353551362 |
| deviance | 0.00536930480034218 | |||||
| pearson_chi2 | 0.005342950892998718 | |||||
| aic | 1.218643907218393 | |||||
| bic | 0.43751973209023154 | |||||
| log_likelihood | 1.3906780463908035 | |||||
| scale | 0.0017809836309995734 |
Example 4: Demo case 4
Inputs:
| y | x | var_power | glm_tweedie_link | fit_intercept | alpha |
|---|---|---|---|---|---|
| 3.5 | 1.5 | 1.5 | log | true | 0.1 |
| 5.2 | 2.5 | ||||
| 7.1 | 3.5 | ||||
| 9.5 | 4.5 | ||||
| 11.8 | 5.5 | ||||
| 13.2 | 6.5 |
Excel formula:
=GLM_TWEEDIE({3.5;5.2;7.1;9.5;11.8;13.2}, {1.5;2.5;3.5;4.5;5.5;6.5}, 1.5, "log", TRUE, 0.1)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0.9954692897509494 | 0.11096471007931333 | 8.971043938558719 | 2.937172544485591e-19 | 0.8129485839133723 | 1.1779899955885265 |
| x1 | 0.2590961324508474 | 0.023679261440779684 | 10.941900916075031 | 7.266022592467453e-28 | 0.2201472133864488 | 0.298045051515246 |
| deviance | 0.1098369952938425 | |||||
| pearson_chi2 | 0.1077084267312591 | |||||
| aic | 15.671587705582624 | |||||
| bic | 15.255106644038733 | |||||
| log_likelihood | -5.835793852791312 | |||||
| scale | 0.026927106682814846 |
Python Code
import math
import warnings
import numpy as np
import statsmodels.api as sm
from statsmodels.genmod.families import Tweedie
from statsmodels.genmod.families.links import Log, Power
from statsmodels.genmod.generalized_linear_model import SET_USE_BIC_LLF
def glm_tweedie(y, x, var_power=1.5, glm_tweedie_link='log', fit_intercept=True, alpha=0.05):
"""
Fits a Generalized Linear Model with Tweedie family for flexible distribution modeling.
See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Dependent variable as a column vector (2D list with one column).
x (list[list]): Independent variables (predictors) as a matrix. Each column represents a predictor variable.
var_power (float, optional): Variance power parameter controlling the Tweedie distribution (1 <= var_power <= 2). 1.0 = Poisson, 1.5 = Compound Poisson-Gamma, 2.0 = Gamma. Default is 1.5.
glm_tweedie_link (str, optional): Link function to use for the Tweedie GLM model. Valid options: Log, Power. Default is 'log'.
fit_intercept (bool, optional): If True, adds an intercept term to the model. Default is True.
alpha (float, optional): Significance level for confidence intervals (0 < alpha < 1). Default is 0.05.
Returns:
list[list]: 2D list with GLM results and statistics, or error string.
"""
def to2d(val):
return [[val]] if not isinstance(val, list) else val
def validate_2d_array(arr, name):
if not isinstance(arr, list):
return f"Invalid input: {name} must be a 2D list."
if len(arr) == 0:
return f"Invalid input: {name} cannot be empty."
if not all(isinstance(row, list) for row in arr):
return f"Invalid input: {name} must be a 2D list."
return None
def validate_numeric(value, name, min_val=None, max_val=None):
try:
num = float(value)
except Exception:
return f"Invalid input: {name} must be a number."
if math.isnan(num) or math.isinf(num):
return f"Invalid input: {name} must be finite."
if min_val is not None and num < min_val:
return f"Invalid input: {name} must be >= {min_val}."
if max_val is not None and num > max_val:
return f"Invalid input: {name} must be <= {max_val}."
return num
# Normalize 2D list inputs
y = to2d(y)
x = to2d(x)
# Validate 2D list inputs
err = validate_2d_array(y, 'y')
if err:
return err
err = validate_2d_array(x, 'x')
if err:
return err
# Convert to numpy arrays and validate
try:
y_arr = np.array(y, dtype=float)
x_arr = np.array(x, dtype=float)
except Exception:
return "Invalid input: y and x must contain numeric values."
if y_arr.ndim != 2 or x_arr.ndim != 2:
return "Invalid input: y and x must be 2D arrays."
# Validate y is a column vector
if y_arr.shape[1] != 1:
return "Invalid input: y must be a column vector (single column)."
# Validate matching dimensions
if y_arr.shape[0] != x_arr.shape[0]:
return "Invalid input: y and x must have the same number of rows."
# Flatten y to 1D
y_arr = y_arr.flatten()
# Validate var_power
var_power_val = validate_numeric(var_power, 'var_power', min_val=1.0, max_val=2.0)
if isinstance(var_power_val, str):
return var_power_val
# Validate alpha
alpha_val = validate_numeric(alpha, 'alpha', min_val=0.0, max_val=1.0)
if isinstance(alpha_val, str):
return alpha_val
# Validate glm_tweedie_link
if not isinstance(glm_tweedie_link, str):
return "Invalid input: glm_tweedie_link must be a string."
link_lower = glm_tweedie_link.lower()
if link_lower == 'log':
link_func = Log()
elif link_lower == 'power':
link_func = Power()
else:
return f"Invalid input: glm_tweedie_link must be 'log' or 'power', got '{glm_tweedie_link}'."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
# Add intercept if requested
if fit_intercept:
x_arr = sm.add_constant(x_arr, has_constant='add')
# Check for valid data
if np.any(np.isnan(y_arr)) or np.any(np.isinf(y_arr)):
return "Invalid input: y contains non-finite values."
if np.any(np.isnan(x_arr)) or np.any(np.isinf(x_arr)):
return "Invalid input: x contains non-finite values."
# Fit the model
try:
SET_USE_BIC_LLF(True)
family = Tweedie(var_power=var_power_val, link=link_func)
model = sm.GLM(y_arr, x_arr, family=family)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
results = model.fit()
except Exception as e:
return f"GLM fitting error: {e}"
# Extract results
try:
params = results.params
std_errors = results.bse
z_stats = results.tvalues
p_values = results.pvalues
conf_int = results.conf_int(alpha=alpha_val)
# Build parameter names
n_predictors = x_arr.shape[1]
if fit_intercept:
param_names = ['intercept'] + [f'x{i}' for i in range(1, n_predictors)]
else:
param_names = [f'x{i}' for i in range(1, n_predictors + 1)]
# Build output table
output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper']]
for i, name in enumerate(param_names):
row = [
name,
float(params[i]),
float(std_errors[i]),
float(z_stats[i]),
float(p_values[i]),
float(conf_int[i, 0]),
float(conf_int[i, 1])
]
output.append(row)
# Add model statistics - check for non-finite values
deviance = float(results.deviance)
pearson_chi2 = float(results.pearson_chi2)
aic = float(results.aic)
bic = float(results.bic_llf)
llf = float(results.llf)
scale = float(results.scale)
# Check if any critical values are non-finite
if math.isinf(aic) or math.isnan(aic):
return "GLM error: AIC is non-finite, model may be degenerate."
if math.isinf(llf) or math.isnan(llf):
return "GLM error: log-likelihood is non-finite, model may be degenerate."
output.append(['deviance', deviance, '', '', '', '', ''])
output.append(['pearson_chi2', pearson_chi2, '', '', '', '', ''])
output.append(['aic', aic, '', '', '', '', ''])
output.append(['bic', bic, '', '', '', '', ''])
output.append(['log_likelihood', llf, '', '', '', '', ''])
output.append(['scale', scale, '', '', '', '', ''])
return output
except Exception as e:
return f"Error extracting results: {e}"