GLM_INV_GAUSS

Overview

The GLM_INV_GAUSS function fits a Generalized Linear Model (GLM) using the Inverse Gaussian (also known as Wald) distribution family. This distribution is designed for modeling strictly positive, right-skewed continuous data where variance increases rapidly with the mean—a characteristic common in reliability analysis, survival times, and financial applications such as modeling claim amounts or asset durations.

The Inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0, \infty). Its name derives from its relationship to Brownian motion: while the Gaussian describes a Brownian motion’s level at a fixed time, the Inverse Gaussian describes the distribution of time for a Brownian motion with positive drift to reach a fixed positive level. The distribution is a member of the exponential dispersion model family, making it suitable for use within the GLM framework.

This implementation uses the statsmodels library’s GLM class with the InverseGaussian family. The Inverse Gaussian family has a variance function V(\mu) = \mu^3, meaning variance scales with the cube of the mean—appropriate for data exhibiting strong heteroscedasticity.

The function supports four link functions to relate the linear predictor \eta = X\beta to the conditional mean \mu:

  • Inverse Squared (canonical): \eta = \mu^{-2}
  • Inverse: \eta = \mu^{-1}
  • Log: \eta = \log(\mu)
  • Identity: \eta = \mu

The inverse squared link is the canonical link for the Inverse Gaussian family, derived from the natural parameter of the exponential family representation. However, the log link is often preferred in practice as it ensures positive fitted values and provides multiplicative interpretation of coefficients.

Model parameters are estimated using Iteratively Reweighted Least Squares (IRLS). The function returns coefficient estimates, standard errors, z-statistics, p-values, confidence intervals, and model diagnostics including deviance, Pearson chi-squared, AIC, BIC, log-likelihood, and the estimated scale parameter.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLM_INV_GAUSS(y, x, glm_inv_gauss_link, fit_intercept, alpha)
  • y (list[list], required): Column vector of positive response values (dependent variable). All values must be greater than zero.
  • x (list[list], required): Matrix of predictor values (independent variables). Each column is a predictor. Must have the same number of rows as y.
  • glm_inv_gauss_link (str, optional, default: “inverse_squared”): Link function relating the linear predictor to the response mean.
  • fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model.
  • alpha (float, optional, default: 0.05): Significance level for confidence interval calculation. Must be between 0 and 1.

Returns (list[list]): 2D list with GLM results and statistics, or error string.

Examples

Example 1: Demo case 1

Inputs:

y x
2.5 1
3.2 2
4.1 3
5.3 4
6.7 5

Excel formula:

=GLM_INV_GAUSS({2.5;3.2;4.1;5.3;6.7}, {1;2;3;4;5})

Expected output:

"non-error"

Example 2: Demo case 2

Inputs:

y x glm_inv_gauss_link
1.2 1 2 log
2.3 1.5 2.5
3.4 2 3
4.5 2.5 3.5
5.6 3 4
6.7 3.5 4.5

Excel formula:

=GLM_INV_GAUSS({1.2;2.3;3.4;4.5;5.6;6.7}, {1,2;1.5,2.5;2,3;2.5,3.5;3,4;3.5,4.5}, "log")

Expected output:

"non-error"

Example 3: Demo case 3

Inputs:

y x glm_inv_gauss_link fit_intercept
1.5 1 inverse false
3 2
4.5 3
6 4

Excel formula:

=GLM_INV_GAUSS({1.5;3;4.5;6}, {1;2;3;4}, "inverse", FALSE)

Expected output:

"non-error"

Example 4: Demo case 4

Inputs:

y x glm_inv_gauss_link alpha
2.5 1 0.5 identity 0.1
3.1 1.5 1.2
4.2 2 1.8
5.8 2.5 2.1
6.1 3 2.9
7.5 3.5 3.2
8.2 4 3.8

Excel formula:

=GLM_INV_GAUSS({2.5;3.1;4.2;5.8;6.1;7.5;8.2}, {1,0.5;1.5,1.2;2,1.8;2.5,2.1;3,2.9;3.5,3.2;4,3.8}, "identity", 0.1)

Expected output:

"non-error"

Python Code

import math
import warnings
from statsmodels.genmod.generalized_linear_model import GLM as statsmodels_GLM, SET_USE_BIC_LLF
from statsmodels.genmod.families import InverseGaussian as statsmodels_InverseGaussian
from statsmodels.genmod.families.links import InverseSquared, InversePower, Log, Identity

def glm_inv_gauss(y, x, glm_inv_gauss_link='inverse_squared', fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Linear Model with Inverse Gaussian family for right-skewed positive data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of positive response values (dependent variable). All values must be greater than zero.
        x (list[list]): Matrix of predictor values (independent variables). Each column is a predictor. Must have the same number of rows as y.
        glm_inv_gauss_link (str, optional): Link function relating the linear predictor to the response mean. Valid options: Inverse Squared, Inverse, Log, Identity. Default is 'inverse_squared'.
        fit_intercept (bool, optional): If true, adds an intercept term to the model. Default is True.
        alpha (float, optional): Significance level for confidence interval calculation. Must be between 0 and 1. Default is 0.05.

    Returns:
        list[list]: 2D list with GLM results and statistics, or error string.
    """
    def to2d(arr):
        return [[arr]] if not isinstance(arr, list) else arr

    def validate_numeric(val, name):
        try:
            num = float(val)
        except Exception:
            return f"Invalid input: {name} must be numeric."
        if math.isnan(num) or math.isinf(num):
            return f"Invalid input: {name} must be finite."
        return num

    # Normalize inputs
    y_2d = to2d(y)
    x_2d = to2d(x)

    # Validate y is a column vector
    if not isinstance(y_2d, list) or len(y_2d) == 0:
        return "Invalid input: y must be a non-empty 2D list."

    # Extract y values
    y_values = []
    for row in y_2d:
        if not isinstance(row, list) or len(row) != 1:
            return "Invalid input: y must be a column vector (single column)."
        val = validate_numeric(row[0], "y")
        if isinstance(val, str):
            return val
        if val <= 0:
            return "Invalid input: y values must be positive for Inverse Gaussian family."
        y_values.append(val)

    # Validate x is a matrix
    if not isinstance(x_2d, list) or len(x_2d) == 0:
        return "Invalid input: x must be a non-empty 2D list."

    n_rows = len(x_2d)
    if n_rows != len(y_values):
        return "Invalid input: y and x must have the same number of rows."

    # Extract x values
    n_cols = None
    x_values = []
    for i, row in enumerate(x_2d):
        if not isinstance(row, list):
            return "Invalid input: x must be a 2D list."
        if n_cols is None:
            n_cols = len(row)
        elif len(row) != n_cols:
            return "Invalid input: all rows in x must have the same number of columns."

        row_vals = []
        for val in row:
            num = validate_numeric(val, "x")
            if isinstance(num, str):
                return num
            row_vals.append(num)
        x_values.append(row_vals)

    # Validate link function
    valid_links = ['inverse_squared', 'inverse', 'log', 'identity']
    if not isinstance(glm_inv_gauss_link, str) or glm_inv_gauss_link not in valid_links:
        return f"Invalid input: glm_inv_gauss_link must be one of {valid_links}."

    # Validate fit_intercept
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    # Validate alpha
    alpha_num = validate_numeric(alpha, "alpha")
    if isinstance(alpha_num, str):
        return alpha_num
    if alpha_num <= 0 or alpha_num >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Add intercept if needed
    if fit_intercept:
        x_with_intercept = [[1.0] + row for row in x_values]
    else:
        x_with_intercept = x_values

    # Create link function using non-deprecated class names
    try:
        if glm_inv_gauss_link == 'inverse_squared':
            link = InverseSquared()
        elif glm_inv_gauss_link == 'inverse':
            link = InversePower()
        elif glm_inv_gauss_link == 'log':
            link = Log()
        else:  # identity
            link = Identity()

        family = statsmodels_InverseGaussian(link=link)
    except Exception as exc:
        return f"Error creating Inverse Gaussian family: {exc}"

    # Fit the model (suppress deprecation warnings)
    try:
        SET_USE_BIC_LLF(True)
        with warnings.catch_warnings():
            warnings.filterwarnings('ignore', category=FutureWarning)
            warnings.filterwarnings('ignore', category=UserWarning)
            model = statsmodels_GLM(y_values, x_with_intercept, family=family)
            results = model.fit()
    except Exception as exc:
        return f"Error fitting GLM: {exc}"

    # Extract results
    try:
        params = results.params
        std_errors = results.bse
        z_stats = results.tvalues
        p_values = results.pvalues
        conf_int = results.conf_int(alpha=alpha_num)

        # Build output table
        output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper']]

        # Add parameter rows
        for i in range(len(params)):
            if fit_intercept and i == 0:
                param_name = 'intercept'
            else:
                predictor_idx = i if not fit_intercept else i - 1
                param_name = f'x{predictor_idx + 1}'

            output.append([
                param_name,
                float(params[i]),
                float(std_errors[i]),
                float(z_stats[i]),
                float(p_values[i]),
                float(conf_int[i, 0]),
                float(conf_int[i, 1])
            ])

        # Add model statistics
        output.append(['deviance', float(results.deviance), '', '', '', '', ''])
        output.append(['pearson_chi2', float(results.pearson_chi2), '', '', '', '', ''])
        output.append(['aic', float(results.aic), '', '', '', '', ''])
        output.append(['bic', float(results.bic_llf), '', '', '', '', ''])
        output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
        output.append(['scale', float(results.scale), '', '', '', '', ''])

        return output

    except Exception as exc:
        return f"Error extracting results: {exc}"

Online Calculator