GLM_INV_GAUSS

Overview

The GLM_INV_GAUSS function fits a Generalized Linear Model (GLM) using the Inverse Gaussian (also known as Wald) distribution family. This distribution is designed for modeling strictly positive, right-skewed continuous data where variance increases rapidly with the mean—a characteristic common in reliability analysis, survival times, and financial applications such as modeling claim amounts or asset durations.

The Inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0, \infty). Its name derives from its relationship to Brownian motion: while the Gaussian describes a Brownian motion’s level at a fixed time, the Inverse Gaussian describes the distribution of time for a Brownian motion with positive drift to reach a fixed positive level. The distribution is a member of the exponential dispersion model family, making it suitable for use within the GLM framework.

This implementation uses the statsmodels library’s GLM class with the InverseGaussian family. The Inverse Gaussian family has a variance function V(\mu) = \mu^3, meaning variance scales with the cube of the mean—appropriate for data exhibiting strong heteroscedasticity.

The function supports four link functions to relate the linear predictor \eta = X\beta to the conditional mean \mu:

Inverse Squared (canonical): \eta = \mu^{-2}
Inverse: \eta = \mu^{-1}
Log: \eta = \log(\mu)
Identity: \eta = \mu

The inverse squared link is the canonical link for the Inverse Gaussian family, derived from the natural parameter of the exponential family representation. However, the log link is often preferred in practice as it ensures positive fitted values and provides multiplicative interpretation of coefficients.

Model parameters are estimated using Iteratively Reweighted Least Squares (IRLS). The function returns coefficient estimates, standard errors, z-statistics, p-values, confidence intervals, and model diagnostics including deviance, Pearson chi-squared, AIC, BIC, log-likelihood, and the estimated scale parameter.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLM_INV_GAUSS(y, x, glm_inv_gauss_link, fit_intercept, alpha)

y (list[list], required): Column vector of positive response values (dependent variable). All values must be greater than zero.
x (list[list], required): Matrix of predictor values (independent variables). Each column is a predictor. Must have the same number of rows as y.
glm_inv_gauss_link (str, optional, default: “inverse_squared”): Link function relating the linear predictor to the response mean.
fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model.
alpha (float, optional, default: 0.05): Significance level for confidence interval calculation. Must be between 0 and 1.

Returns (list[list]): 2D list with GLM results and statistics, or error string.

Example 1: Inverse Gaussian inverse-squared link with single predictor

Inputs:

y	x
2.5	1
3.2	2
4.1	3
5.3	4
6.7	5

Excel formula:

=GLM_INV_GAUSS({2.5;3.2;4.1;5.3;6.7}, {1;2;3;4;5})

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
intercept	0.150998	0.0210432	7.17563	7.19764e-13	0.109754	0.192243
x1	-0.0264735	0.0047014	-5.63099	1.79173e-8	-0.0356881	-0.017259
deviance	0.01191
pearson_chi2	0.0117588
aic	9.69986
bic	8.91874
log_likelihood	-2.84993
scale	0.0039196

Example 2: Inverse Gaussian log link with two predictors

Inputs:

y	x		glm_inv_gauss_link
1.2	1	2	log
2.3	1.5	2.5
3.4	2	3
4.5	2.5	3.5
5.6	3	4
6.7	3.5	4.5

Excel formula:

=GLM_INV_GAUSS({1.2;2.3;3.4;4.5;5.6;6.7}, {1,2;1.5,2.5;2,3;2.5,3.5;3,4;3.5,4.5}, "log")

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
intercept	-0.548212	0.147159	-3.7253	0.000195083	-0.836639	-0.259786
x1	0.653867	0.118774	5.50511	3.68934e-8	0.421073	0.886661
x2	0.105654	0.0328897	3.21238	0.00131638	0.0411917	0.170117
deviance	0.0432841
pearson_chi2	0.041038
aic	13.8728
bic	13.4563
log_likelihood	-4.93638
scale	0.0102595

Example 3: Inverse Gaussian inverse link without intercept

Inputs:

y	x	glm_inv_gauss_link	fit_intercept
1.5	1	inverse	false
3	2
4.5	3
6	4

Excel formula:

=GLM_INV_GAUSS({1.5;3;4.5;6}, {1;2;3;4}, "inverse", FALSE)

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
x1	0.0666667	0.0210819	3.16228	0.0015654	0.025347	0.107986
deviance	0.722222
pearson_chi2	0.2
aic	23.7524
bic	23.1387
log_likelihood	-10.8762
scale	0.0666667

Example 4: Inverse Gaussian identity link with custom alpha

Inputs:

y	x		glm_inv_gauss_link	alpha
2.5	1	0.5	identity	0.1
3.1	1.5	1.2
4.2	2	1.8
5.8	2.5	2.1
6.1	3	2.9
7.5	3.5	3.2
8.2	4	3.8

Excel formula:

=GLM_INV_GAUSS({2.5;3.1;4.2;5.8;6.1;7.5;8.2}, {1,0.5;1.5,1.2;2,1.8;2.5,2.1;3,2.9;3.5,3.2;4,3.8}, "identity", 0.1)

Expected output:

parameter	coefficient	std_error	z_statistic	p_value	ci_lower	ci_upper
intercept	-0.81896	0.313672	-2.61088	0.00903099	-1.33491	-0.303015
x1	4.40174	0.569054	7.73518	1.03254e-14	3.46573	5.33775
x2	-2.17315	0.493164	-4.40653	0.0000105038	-2.98433	-1.36196
deviance	0.000814544
pearson_chi2	0.000815117
aic	-3.1231
bic	-3.28537
log_likelihood	4.56155
scale	0.000203779

Python Code

Show Code

import math
import warnings
from statsmodels.genmod.generalized_linear_model import GLM as statsmodels_GLM, SET_USE_BIC_LLF
from statsmodels.genmod.families import InverseGaussian as statsmodels_InverseGaussian
from statsmodels.genmod.families.links import InverseSquared, InversePower, Log, Identity

def glm_inv_gauss(y, x, glm_inv_gauss_link='inverse_squared', fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Linear Model with Inverse Gaussian family for right-skewed positive data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of positive response values (dependent variable). All values must be greater than zero.
        x (list[list]): Matrix of predictor values (independent variables). Each column is a predictor. Must have the same number of rows as y.
        glm_inv_gauss_link (str, optional): Link function relating the linear predictor to the response mean. Valid options: Inverse Squared, Inverse, Log, Identity. Default is 'inverse_squared'.
        fit_intercept (bool, optional): If true, adds an intercept term to the model. Default is True.
        alpha (float, optional): Significance level for confidence interval calculation. Must be between 0 and 1. Default is 0.05.

    Returns:
        list[list]: 2D list with GLM results and statistics, or error string.
    """
    def to2d(arr):
        return [[arr]] if not isinstance(arr, list) else arr

    def validate_numeric(val, name):
        try:
            num = float(val)
        except Exception:
            return f"Error: Invalid input: {name} must be numeric."
        if math.isnan(num) or math.isinf(num):
            return f"Error: Invalid input: {name} must be finite."
        return num

    try:
        # Normalize inputs
        y_2d = to2d(y)
        x_2d = to2d(x)

        # Validate y is a column vector
        if not isinstance(y_2d, list) or len(y_2d) == 0:
            return "Error: Invalid input: y must be a non-empty 2D list."

        # Extract y values
        y_values = []
        for row in y_2d:
            if not isinstance(row, list) or len(row) != 1:
                return "Error: Invalid input: y must be a column vector (single column)."
            val = validate_numeric(row[0], "y")
            if isinstance(val, str):
                return val
            if val <= 0:
                return "Error: Invalid input: y values must be positive for Inverse Gaussian family."
            y_values.append(val)

        # Validate x is a matrix
        if not isinstance(x_2d, list) or len(x_2d) == 0:
            return "Error: Invalid input: x must be a non-empty 2D list."

        n_rows = len(x_2d)
        if n_rows != len(y_values):
            return "Error: Invalid input: y and x must have the same number of rows."

        # Extract x values
        n_cols = None
        x_values = []
        for i, row in enumerate(x_2d):
            if not isinstance(row, list):
                return "Error: Invalid input: x must be a 2D list."
            if n_cols is None:
                n_cols = len(row)
            elif len(row) != n_cols:
                return "Error: Invalid input: all rows in x must have the same number of columns."

            row_vals = []
            for val in row:
                num = validate_numeric(val, "x")
                if isinstance(num, str):
                    return num
                row_vals.append(num)
            x_values.append(row_vals)

        # Validate link function
        valid_links = ['inverse_squared', 'inverse', 'log', 'identity']
        if not isinstance(glm_inv_gauss_link, str) or glm_inv_gauss_link not in valid_links:
            return f"Error: Invalid input: glm_inv_gauss_link must be one of {valid_links}."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: Invalid input: fit_intercept must be a boolean."

        # Validate alpha
        alpha_num = validate_numeric(alpha, "alpha")
        if isinstance(alpha_num, str):
            return alpha_num
        if alpha_num <= 0 or alpha_num >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        # Add intercept if needed
        if fit_intercept:
            x_with_intercept = [[1.0] + row for row in x_values]
        else:
            x_with_intercept = x_values

        # Create link function using non-deprecated class names
        try:
            if glm_inv_gauss_link == 'inverse_squared':
                link = InverseSquared()
            elif glm_inv_gauss_link == 'inverse':
                link = InversePower()
            elif glm_inv_gauss_link == 'log':
                link = Log()
            else:  # identity
                link = Identity()

            family = statsmodels_InverseGaussian(link=link)
        except Exception as exc:
            return f"Error: Error creating Inverse Gaussian family: {exc}"

        # Fit the model (suppress deprecation warnings)
        try:
            SET_USE_BIC_LLF(True)
            with warnings.catch_warnings():
                warnings.filterwarnings('ignore', category=FutureWarning)
                warnings.filterwarnings('ignore', category=UserWarning)
                model = statsmodels_GLM(y_values, x_with_intercept, family=family)
                results = model.fit()
        except Exception as exc:
            return f"Error: Error fitting GLM: {exc}"

        # Extract results
        try:
            params = results.params
            std_errors = results.bse
            z_stats = results.tvalues
            p_values = results.pvalues
            conf_int = results.conf_int(alpha=alpha_num)

            # Build output table
            output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper']]

            # Add parameter rows
            for i in range(len(params)):
                if fit_intercept and i == 0:
                    param_name = 'intercept'
                else:
                    predictor_idx = i if not fit_intercept else i - 1
                    param_name = f'x{predictor_idx + 1}'

                output.append([
                    param_name,
                    float(params[i]),
                    float(std_errors[i]),
                    float(z_stats[i]),
                    float(p_values[i]),
                    float(conf_int[i, 0]),
                    float(conf_int[i, 1])
                ])

            # Add model statistics
            output.append(['deviance', float(results.deviance), '', '', '', '', ''])
            output.append(['pearson_chi2', float(results.pearson_chi2), '', '', '', '', ''])
            output.append(['aic', float(results.aic), '', '', '', '', ''])
            output.append(['bic', float(results.bic_llf), '', '', '', '', ''])
            output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
            output.append(['scale', float(results.scale), '', '', '', '', ''])

            return output

        except Exception as exc:
            return f"Error: Error extracting results: {exc}"
    except Exception as exc:
        return f"Error: {str(exc)}"

Online Calculator

y *

Column vector of positive response values (dependent variable). All values must be greater than zero.

x *

Matrix of predictor values (independent variables). Each column is a predictor. Must have the same number of rows as y.

glm_inv_gauss_link

Link function relating the linear predictor to the response mean.

fit_intercept

If true, adds an intercept term to the model.

alpha

Significance level for confidence interval calculation. Must be between 0 and 1.