GLM_TWEEDIE

Overview

The GLM_TWEEDIE function fits a Generalized Linear Model (GLM) using the Tweedie family of distributions, providing a flexible framework for modeling data with non-normal response distributions. The Tweedie distribution is a special case of exponential dispersion models that unifies several common distributions including Poisson, Gamma, and compound Poisson-Gamma (useful for data with exact zeros and positive continuous values).

This implementation uses the statsmodels library’s GLM module. For detailed documentation, see the statsmodels GLM documentation and the Tweedie family reference.

The key parameter is the variance power p, which controls the relationship between the variance and mean of the response variable:

\text{Var}(Y) = \phi \cdot \mu^p

where \mu is the mean, \phi is the dispersion parameter, and p is the variance power. Different values of p correspond to different distributions:

  • p = 1: Poisson (count data)
  • 1 < p < 2: Compound Poisson-Gamma (insurance claims, rainfall with zeros)
  • p = 2: Gamma (positive continuous data)

The function supports Log and Power link functions. The log link (default) is commonly used and ensures positive predictions:

\log(\mu) = X\beta \quad \Rightarrow \quad \mu = e^{X\beta}

Model output includes coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals for each predictor. Goodness-of-fit statistics such as deviance, Pearson chi-squared, AIC, BIC, and log-likelihood are also provided. Tweedie GLMs are widely applied in actuarial science for insurance claim modeling, ecology for species abundance data, and any domain where data exhibits a mean-variance power relationship.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLM_TWEEDIE(y, x, var_power, glm_tweedie_link, fit_intercept, alpha)
  • y (list[list], required): Dependent variable as a column vector (2D list with one column).
  • x (list[list], required): Independent variables (predictors) as a matrix. Each column represents a predictor variable.
  • var_power (float, optional, default: 1.5): Variance power parameter controlling the Tweedie distribution (1 <= var_power <= 2). 1.0 = Poisson, 1.5 = Compound Poisson-Gamma, 2.0 = Gamma.
  • glm_tweedie_link (str, optional, default: “log”): Link function to use for the Tweedie GLM model.
  • fit_intercept (bool, optional, default: true): If True, adds an intercept term to the model.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (0 < alpha < 1).

Returns (list[list]): 2D list with GLM results and statistics, or error string.

Example 1: Tweedie default variance power

Inputs:

y x
1.2 1
2.3 2
3.8 3
4.5 4
6.1 5
7.2 6

Excel formula:

=GLM_TWEEDIE({1.2;2.3;3.8;4.5;6.1;7.2}, {1;2;3;4;5;6})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
intercept 0.167371 0.17776 0.941554 0.346421 -0.181032 0.515773
x1 0.320553 0.0413659 7.7492 9.24749e-15 0.239477 0.401628
deviance 0.232575
pearson_chi2 0.226006
aic 13.3931
bic 12.9766
log_likelihood -4.69655
scale 0.0565014
Example 2: Tweedie Poisson variance power

Inputs:

y x var_power
1 1 1
3 2
2 2.5
5 4
4 3.5
8 5

Excel formula:

=GLM_TWEEDIE({1;3;2;5;4;8}, {1;2;2.5;4;3.5;5}, 1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
intercept -0.178241 0.271722 -0.655969 0.511844 -0.710806 0.354324
x1 0.45011 0.0691825 6.50613 7.71097e-11 0.314515 0.585706
deviance 0.60015
pearson_chi2 0.636284
aic 122.753
bic 122.337
log_likelihood -59.3766
scale 0.159071
Example 3: Tweedie gamma variance power with power link

Inputs:

y x var_power glm_tweedie_link
2.1 1 2 power
3.9 2
6.2 3
7.8 4
10.5 5

Excel formula:

=GLM_TWEEDIE({2.1;3.9;6.2;7.8;10.5}, {1;2;3;4;5}, 2, "power")

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
intercept 0.0661407 0.133624 0.494977 0.620617 -0.195757 0.328038
x1 2.00313 0.0714736 28.0261 7.81611e-173 1.86304 2.14321
deviance 0.0053693
pearson_chi2 0.00534295
aic 1.21864
bic 0.43752
log_likelihood 1.39068
scale 0.00178098
Example 4: Tweedie with all parameters

Inputs:

y x var_power glm_tweedie_link fit_intercept alpha
3.5 1.5 1.5 log true 0.1
5.2 2.5
7.1 3.5
9.5 4.5
11.8 5.5
13.2 6.5

Excel formula:

=GLM_TWEEDIE({3.5;5.2;7.1;9.5;11.8;13.2}, {1.5;2.5;3.5;4.5;5.5;6.5}, 1.5, "log", TRUE, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
intercept 0.995469 0.110965 8.97104 2.93717e-19 0.812949 1.17799
x1 0.259096 0.0236793 10.9419 7.26602e-28 0.220147 0.298045
deviance 0.109837
pearson_chi2 0.107708
aic 15.6716
bic 15.2551
log_likelihood -5.83579
scale 0.0269271

Python Code

Show Code
import math
import warnings
import numpy as np
import statsmodels.api as sm
from statsmodels.genmod.families import Tweedie
from statsmodels.genmod.families.links import Log, Power
from statsmodels.genmod.generalized_linear_model import SET_USE_BIC_LLF

def glm_tweedie(y, x, var_power=1.5, glm_tweedie_link='log', fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Linear Model with Tweedie family for flexible distribution modeling.

    See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Dependent variable as a column vector (2D list with one column).
        x (list[list]): Independent variables (predictors) as a matrix. Each column represents a predictor variable.
        var_power (float, optional): Variance power parameter controlling the Tweedie distribution (1 <= var_power <= 2). 1.0 = Poisson, 1.5 = Compound Poisson-Gamma, 2.0 = Gamma. Default is 1.5.
        glm_tweedie_link (str, optional): Link function to use for the Tweedie GLM model. Valid options: Log, Power. Default is 'log'.
        fit_intercept (bool, optional): If True, adds an intercept term to the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (0 < alpha < 1). Default is 0.05.

    Returns:
        list[list]: 2D list with GLM results and statistics, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_2d_array(arr, name):
        if not isinstance(arr, list):
            return f"Error: Invalid input: {name} must be a 2D list."
        if len(arr) == 0:
            return f"Error: Invalid input: {name} cannot be empty."
        if not all(isinstance(row, list) for row in arr):
            return f"Error: Invalid input: {name} must be a 2D list."
        return None

    def validate_numeric(value, name, min_val=None, max_val=None):
        try:
            num = float(value)
        except Exception:
            return f"Error: Invalid input: {name} must be a number."
        if math.isnan(num) or math.isinf(num):
            return f"Error: Invalid input: {name} must be finite."
        if min_val is not None and num < min_val:
            return f"Error: Invalid input: {name} must be >= {min_val}."
        if max_val is not None and num > max_val:
            return f"Error: Invalid input: {name} must be <= {max_val}."
        return num

    try:
        # Normalize 2D list inputs
        y = to2d(y)
        x = to2d(x)

        # Validate 2D list inputs
        err = validate_2d_array(y, 'y')
        if err:
            return err
        err = validate_2d_array(x, 'x')
        if err:
            return err

        # Convert to numpy arrays and validate
        try:
            y_arr = np.array(y, dtype=float)
            x_arr = np.array(x, dtype=float)
        except Exception:
            return "Error: Invalid input: y and x must contain numeric values."

        if y_arr.ndim != 2 or x_arr.ndim != 2:
            return "Error: Invalid input: y and x must be 2D arrays."

        # Validate y is a column vector
        if y_arr.shape[1] != 1:
            return "Error: Invalid input: y must be a column vector (single column)."

        # Validate matching dimensions
        if y_arr.shape[0] != x_arr.shape[0]:
            return "Error: Invalid input: y and x must have the same number of rows."

        # Flatten y to 1D
        y_arr = y_arr.flatten()

        # Validate var_power
        var_power_val = validate_numeric(var_power, 'var_power', min_val=1.0, max_val=2.0)
        if isinstance(var_power_val, str):
            return var_power_val

        # Validate alpha
        alpha_val = validate_numeric(alpha, 'alpha')
        if isinstance(alpha_val, str):
            return alpha_val
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        # Validate glm_tweedie_link
        if not isinstance(glm_tweedie_link, str):
            return "Error: Invalid input: glm_tweedie_link must be a string."

        link_lower = glm_tweedie_link.lower()
        if link_lower == 'log':
            link_func = Log()
        elif link_lower == 'power':
            link_func = Power()
        else:
            return f"Error: Invalid input: glm_tweedie_link must be 'log' or 'power', got '{glm_tweedie_link}'."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: Invalid input: fit_intercept must be a boolean."

        # Add intercept if requested
        if fit_intercept:
            x_arr = sm.add_constant(x_arr, has_constant='add')

        # Check for valid data
        if np.any(np.isnan(y_arr)) or np.any(np.isinf(y_arr)):
            return "Error: Invalid input: y contains non-finite values."
        if np.any(np.isnan(x_arr)) or np.any(np.isinf(x_arr)):
            return "Error: Invalid input: x contains non-finite values."

        # Fit the model
        try:
            SET_USE_BIC_LLF(True)
            family = Tweedie(var_power=var_power_val, link=link_func)
            model = sm.GLM(y_arr, x_arr, family=family)
            with warnings.catch_warnings():
                warnings.simplefilter("ignore")
                results = model.fit()
        except Exception as e:
            return f"Error: GLM fitting error: {e}"

        # Extract results
        try:
            params = results.params
            std_errors = results.bse
            z_stats = results.tvalues
            p_values = results.pvalues
            conf_int = results.conf_int(alpha=alpha_val)

            # Build parameter names
            n_predictors = x_arr.shape[1]
            if fit_intercept:
                param_names = ['intercept'] + [f'x{i}' for i in range(1, n_predictors)]
            else:
                param_names = [f'x{i}' for i in range(1, n_predictors + 1)]

            # Build output table
            output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper']]

            for i, name in enumerate(param_names):
                row = [
                    name,
                    float(params[i]),
                    float(std_errors[i]),
                    float(z_stats[i]),
                    float(p_values[i]),
                    float(conf_int[i, 0]),
                    float(conf_int[i, 1])
                ]
                output.append(row)

            # Add model statistics - check for non-finite values
            deviance = float(results.deviance)
            pearson_chi2 = float(results.pearson_chi2)
            aic = float(results.aic)
            bic = float(results.bic_llf)
            llf = float(results.llf)
            scale = float(results.scale)

            # Check if any critical values are non-finite
            if math.isinf(aic) or math.isnan(aic):
                return "Error: GLM error: AIC is non-finite, model may be degenerate."
            if math.isinf(llf) or math.isnan(llf):
                return "Error: GLM error: log-likelihood is non-finite, model may be degenerate."

            output.append(['deviance', deviance, '', '', '', '', ''])
            output.append(['pearson_chi2', pearson_chi2, '', '', '', '', ''])
            output.append(['aic', aic, '', '', '', '', ''])
            output.append(['bic', bic, '', '', '', '', ''])
            output.append(['log_likelihood', llf, '', '', '', '', ''])
            output.append(['scale', scale, '', '', '', '', ''])

            return output
        except Exception as e:
            return f"Error: Error extracting results: {e}"
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Dependent variable as a column vector (2D list with one column).
Independent variables (predictors) as a matrix. Each column represents a predictor variable.
Variance power parameter controlling the Tweedie distribution (1 <= var_power <= 2). 1.0 = Poisson, 1.5 = Compound Poisson-Gamma, 2.0 = Gamma.
Link function to use for the Tweedie GLM model.
If True, adds an intercept term to the model.
Significance level for confidence intervals (0 < alpha < 1).