GLM_GAMMA

Overview

The GLM_GAMMA function fits a Generalized Linear Model (GLM) with the Gamma distribution family, designed for modeling positive continuous response variables. This model is particularly well-suited for data where the variance increases with the mean, such as insurance claims, survival times, income, and other right-skewed positive quantities.

Generalized Linear Models extend ordinary linear regression by allowing the response variable to follow a distribution from the exponential family and relating the mean response to the linear predictor through a link function. The Gamma GLM assumes responses follow a Gamma distribution, where the variance is proportional to the square of the mean:

\text{Var}(Y) = \phi \mu^2

where \mu is the mean and \phi is the dispersion (scale) parameter.

This implementation uses the statsmodels library’s GLM class with the Gamma family. The function supports three link functions:

  • Inverse (canonical link): g(\mu) = 1/\mu, the default and theoretically optimal for Gamma
  • Log: g(\mu) = \log(\mu), ensures positive fitted values and is commonly used
  • Identity: g(\mu) = \mu, provides direct interpretation but may produce negative predictions

The model is fitted using Iteratively Reweighted Least Squares (IRLS), an efficient algorithm for maximum likelihood estimation in GLMs. The output includes coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals for each predictor, along with model fit statistics including deviance, Pearson chi-squared, AIC, BIC, and log-likelihood.

For additional background, see McCullagh & Nelder’s foundational text Generalized Linear Models (1989) and the statsmodels GLM documentation. The source code is available on the statsmodels GitHub repository.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLM_GAMMA(y, x, glm_gamma_link, fit_intercept, alpha)
  • y (list[list], required): Dependent variable as a column vector. Must contain positive values.
  • x (list[list], required): Independent variables (predictors). Each column represents a predictor.
  • glm_gamma_link (str, optional, default: “inverse”): Link function to use for the Gamma GLM.
  • fit_intercept (bool, optional, default: true): If True, includes an intercept term in the model.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1).

Returns (list[list]): 2D list with GLM results and statistics, or error string.

Examples

Example 1: Demo case 1

Inputs:

y x
2.5 1
3.2 2
4.1 3
5.3 4
6.7 5

Excel formula:

=GLM_GAMMA({2.5;3.2;4.1;5.3;6.7}, {1;2;3;4;5})

Expected output:

"non-error"

Example 2: Demo case 2

Inputs:

y x glm_gamma_link
1.2 1 2 log
2.3 1.5 2.5
3.4 2 3
4.5 2.5 3.5
5.6 3 4
6.7 3.5 4.5

Excel formula:

=GLM_GAMMA({1.2;2.3;3.4;4.5;5.6;6.7}, {1,2;1.5,2.5;2,3;2.5,3.5;3,4;3.5,4.5}, "log")

Expected output:

"non-error"

Example 3: Demo case 3

Inputs:

y x glm_gamma_link fit_intercept
1.5 1 identity false
3 2
4.5 3
6 4

Excel formula:

=GLM_GAMMA({1.5;3;4.5;6}, {1;2;3;4}, "identity", FALSE)

Expected output:

"non-error"

Example 4: Demo case 4

Inputs:

y x alpha
2.1 1 0.5 0.1
3.2 1.5 1
4.3 2 1.5
5.4 2.5 2
6.5 3 2.5
7.6 3.5 3
8.7 4 3.5

Excel formula:

=GLM_GAMMA({2.1;3.2;4.3;5.4;6.5;7.6;8.7}, {1,0.5;1.5,1;2,1.5;2.5,2;3,2.5;3.5,3;4,3.5}, 0.1)

Expected output:

"non-error"

Python Code

import math
import warnings
from statsmodels.genmod.generalized_linear_model import GLM as statsmodels_GLM, SET_USE_BIC_LLF
from statsmodels.genmod.families import Gamma as statsmodels_Gamma
from statsmodels.genmod.families.links import InversePower, Log, Identity

def glm_gamma(y, x, glm_gamma_link='inverse', fit_intercept=True, alpha=0.05):
    """
    Fit a Generalized Linear Model with Gamma family for positive continuous data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Dependent variable as a column vector. Must contain positive values.
        x (list[list]): Independent variables (predictors). Each column represents a predictor.
        glm_gamma_link (str, optional): Link function to use for the Gamma GLM. Valid options: Inverse, Log, Identity. Default is 'inverse'.
        fit_intercept (bool, optional): If True, includes an intercept term in the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1). Default is 0.05.

    Returns:
        list[list]: 2D list with GLM results and statistics, or error string.
    """
    def to2d(arr):
        return [[arr]] if not isinstance(arr, list) else arr

    def validate_numeric(val, name):
        try:
            num = float(val)
        except Exception:
            return f"Invalid input: {name} must be numeric."
        if math.isnan(num) or math.isinf(num):
            return f"Invalid input: {name} must be finite."
        return num

    # Normalize inputs
    y_2d = to2d(y)
    x_2d = to2d(x)

    # Validate y is a column vector
    if not isinstance(y_2d, list) or len(y_2d) == 0:
        return "Invalid input: y must be a non-empty 2D list."

    # Extract y values
    y_values = []
    for row in y_2d:
        if not isinstance(row, list) or len(row) != 1:
            return "Invalid input: y must be a column vector (single column)."
        val = validate_numeric(row[0], "y")
        if isinstance(val, str):
            return val
        if val <= 0:
            return "Invalid input: y values must be positive for Gamma family."
        y_values.append(val)

    # Validate x is a matrix
    if not isinstance(x_2d, list) or len(x_2d) == 0:
        return "Invalid input: x must be a non-empty 2D list."

    n_rows = len(x_2d)
    if n_rows != len(y_values):
        return "Invalid input: y and x must have the same number of rows."

    # Extract x values
    n_cols = None
    x_values = []
    for i, row in enumerate(x_2d):
        if not isinstance(row, list):
            return "Invalid input: x must be a 2D list."
        if n_cols is None:
            n_cols = len(row)
        elif len(row) != n_cols:
            return "Invalid input: all rows in x must have the same number of columns."

        row_vals = []
        for val in row:
            num = validate_numeric(val, "x")
            if isinstance(num, str):
                return num
            row_vals.append(num)
        x_values.append(row_vals)

    # Validate link function
    valid_links = ['inverse', 'log', 'identity']
    if not isinstance(glm_gamma_link, str) or glm_gamma_link not in valid_links:
        return f"Invalid input: glm_gamma_link must be one of {valid_links}."

    # Validate fit_intercept
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    # Validate alpha
    alpha_num = validate_numeric(alpha, "alpha")
    if isinstance(alpha_num, str):
        return alpha_num
    if alpha_num <= 0 or alpha_num >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Add intercept if needed
    if fit_intercept:
        x_with_intercept = [[1.0] + row for row in x_values]
    else:
        x_with_intercept = x_values

    # Create link function
    try:
        if glm_gamma_link == 'inverse':
            link = InversePower()
        elif glm_gamma_link == 'log':
            link = Log()
        else:  # identity
            link = Identity()

        family = statsmodels_Gamma(link=link)
    except Exception as exc:
        return f"Error creating Gamma family: {exc}"

    # Fit the model
    try:
        SET_USE_BIC_LLF(True)
        with warnings.catch_warnings():
            warnings.filterwarnings('ignore', category=FutureWarning)
            warnings.filterwarnings('ignore', message='.*Perfect separation.*')
            warnings.filterwarnings('ignore', message='.*does not respect the domain.*')
            model = statsmodels_GLM(y_values, x_with_intercept, family=family)
            results = model.fit()
    except Exception as exc:
        return f"Error fitting GLM: {exc}"

    # Extract results
    try:
        params = results.params
        std_errors = results.bse
        z_stats = results.tvalues
        p_values = results.pvalues
        conf_int = results.conf_int(alpha=alpha_num)

        # Build output table
        output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper']]

        # Add parameter rows
        for i in range(len(params)):
            if fit_intercept and i == 0:
                param_name = 'intercept'
            else:
                predictor_idx = i if not fit_intercept else i - 1
                param_name = f'x{predictor_idx + 1}'

            output.append([
                param_name,
                float(params[i]),
                float(std_errors[i]),
                float(z_stats[i]),
                float(p_values[i]),
                float(conf_int[i, 0]),
                float(conf_int[i, 1])
            ])

        # Add model statistics
        output.append(['deviance', float(results.deviance), '', '', '', '', ''])
        output.append(['pearson_chi2', float(results.pearson_chi2), '', '', '', '', ''])
        output.append(['aic', float(results.aic), '', '', '', '', ''])
        output.append(['bic', float(results.bic_llf), '', '', '', '', ''])
        output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
        output.append(['scale', float(results.scale), '', '', '', '', ''])

        return output

    except Exception as exc:
        return f"Error extracting results: {exc}"

Online Calculator