NHYPERGEOM

Overview

The NHYPERGEOM function computes values for the negative hypergeometric distribution, a discrete probability distribution that models sampling without replacement from a finite population until a specified number of failures is observed. Unlike the standard hypergeometric distribution, which counts successes in a fixed sample size, the negative hypergeometric distribution describes the number of successes encountered before accumulating a target number of failures.

Consider a population containing M objects, of which n are classified as “successes” (e.g., red balls) and M - n are “failures” (e.g., blue balls). Objects are drawn one at a time without replacement until r failures have been observed. The negative hypergeometric distribution gives the probability of observing exactly k successes during this sampling process. This distribution is particularly useful in quality control, ecological sampling, and clinical trial design where sampling stops after encountering a certain number of defective items or negative outcomes.

The probability mass function (PMF) is defined as:

f(k; M, n, r) = \frac{\binom{k+r-1}{k} \binom{M-r-k}{n-k}}{\binom{M}{n}}

where k \in [0, n] represents the number of successes, n \in [0, M] is the total number of successes in the population, and r \in [0, M-n] is the number of failures required to stop sampling. The distribution has expected value E[X] = \frac{rn}{M - n + 1} and variance \text{Var}[X] = \frac{rn(M+1)(M-n-r+1)}{(M-n+1)^2(M-n+2)}.

The negative hypergeometric distribution is closely related to several other distributions. It is a special case of the beta-binomial distribution and has an analogous relationship to the hypergeometric distribution as the negative binomial has to the binomial—both “negative” variants involve sampling until a fixed number of failures rather than a fixed sample size, but differ in whether sampling is with or without replacement.

This implementation uses SciPy’s scipy.stats.nhypergeom module, which provides methods for computing the PMF, CDF, survival function, inverse CDF (percent point function), and descriptive statistics. For more background on the distribution, see the negative hypergeometric distribution on Wikipedia.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=NHYPERGEOM(k, m, n, r, nhypergeom_mode, loc)
  • k (list[list], required): Value(s) at which to evaluate the distribution (number of Type I objects drawn for pmf/cdf/sf, probability for icdf/isf, ignored for statistics modes).
  • m (int, required): Total number of objects in the population (must be >= 1).
  • n (int, required): Number of Type I objects in the population (0 <= n <= m).
  • r (int, required): Number of Type II objects to draw before stopping (0 <= r <= m-n).
  • nhypergeom_mode (str, optional, default: “pmf”): Output type to compute.
  • loc (float, optional, default: 0): Location parameter that shifts the distribution.

Returns (float): Distribution result (float), or error message string.

Examples

Example 1: PMF at k=3

Inputs:

k m n r
3 20 7 12

Excel formula:

=NHYPERGEOM(3, 20, 7, 12)

Expected output:

0.02348

Example 2: CDF at k=3

Inputs:

k m n r nhypergeom_mode
3 20 7 12 cdf

Excel formula:

=NHYPERGEOM(3, 20, 7, 12, "cdf")

Expected output:

0.0307

Example 3: Survival function at k=3

Inputs:

k m n r nhypergeom_mode
3 20 7 12 sf

Excel formula:

=NHYPERGEOM(3, 20, 7, 12, "sf")

Expected output:

0.9693

Example 4: Inverse CDF for probability 0.5

Inputs:

k m n r nhypergeom_mode
0.5 20 7 12 icdf

Excel formula:

=NHYPERGEOM(0.5, 20, 7, 12, "icdf")

Expected output:

6

Python Code

from scipy.stats import nhypergeom as scipy_nhypergeom

def nhypergeom(k, m, n, r, nhypergeom_mode='pmf', loc=0):
    """
    Compute Negative Hypergeometric distribution values using scipy.stats.nhypergeom.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.nhypergeom.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        k (list[list]): Value(s) at which to evaluate the distribution (number of Type I objects drawn for pmf/cdf/sf, probability for icdf/isf, ignored for statistics modes).
        m (int): Total number of objects in the population (must be >= 1).
        n (int): Number of Type I objects in the population (0 <= n <= m).
        r (int): Number of Type II objects to draw before stopping (0 <= r <= m-n).
        nhypergeom_mode (str, optional): Output type to compute. Valid options: PMF, CDF, SF, ICDF, ISF, Mean, Var, Std, Median. Default is 'pmf'.
        loc (float, optional): Location parameter that shifts the distribution. Default is 0.

    Returns:
        float: Distribution result (float), or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    # Validate m
    try:
        m_val = int(m)
        if m_val < 1:
            return "Invalid input: m must be >= 1."
    except (ValueError, TypeError):
        return "Invalid input: m must be an integer."

    # Validate n
    try:
        n_val = int(n)
        if not (0 <= n_val <= m_val):
            return "Invalid input: n must be between 0 and m."
    except (ValueError, TypeError):
        return "Invalid input: n must be an integer."

    # Validate r
    try:
        r_val = int(r)
        if not (0 <= r_val <= m_val - n_val):
            return "Invalid input: r must be between 0 and m-n."
    except (ValueError, TypeError):
        return "Invalid input: r must be an integer."

    # Validate loc
    try:
        loc_val = float(loc)
    except (ValueError, TypeError):
        return "Invalid input: loc must be a number."

    # Validate nhypergeom_mode
    valid_modes = {"pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"}
    if not isinstance(nhypergeom_mode, str) or nhypergeom_mode not in valid_modes:
        return f"Invalid input: nhypergeom_mode must be one of {sorted(valid_modes)}."

    # Handle statistics modes (return scalar)
    if nhypergeom_mode in ["mean", "var", "std", "median"]:
        if nhypergeom_mode == "mean":
            return float(scipy_nhypergeom.mean(m_val, n_val, r_val, loc=loc_val))
        elif nhypergeom_mode == "var":
            return float(scipy_nhypergeom.var(m_val, n_val, r_val, loc=loc_val))
        elif nhypergeom_mode == "std":
            return float(scipy_nhypergeom.std(m_val, n_val, r_val, loc=loc_val))
        elif nhypergeom_mode == "median":
            return float(scipy_nhypergeom.median(m_val, n_val, r_val, loc=loc_val))

    # Helper to process k
    def compute(val):
        try:
            kval = float(val)
        except (ValueError, TypeError):
            return "Invalid input: k must be a number."

        if nhypergeom_mode == "pmf":
            return float(scipy_nhypergeom.pmf(kval, m_val, n_val, r_val, loc=loc_val))
        elif nhypergeom_mode == "cdf":
            return float(scipy_nhypergeom.cdf(kval, m_val, n_val, r_val, loc=loc_val))
        elif nhypergeom_mode == "sf":
            return float(scipy_nhypergeom.sf(kval, m_val, n_val, r_val, loc=loc_val))
        elif nhypergeom_mode == "icdf":
            return float(scipy_nhypergeom.ppf(kval, m_val, n_val, r_val, loc=loc_val))
        elif nhypergeom_mode == "isf":
            return float(scipy_nhypergeom.isf(kval, m_val, n_val, r_val, loc=loc_val))
        return "Unknown mode"

    # Normalize k to 2D list
    k_list = to2d(k)

    # Validate k is 2D list
    if not isinstance(k_list, list) or not all(isinstance(row, list) for row in k_list):
        return "Invalid input: k must be a scalar or 2D list."

    result = []
    for row in k_list:
        result_row = []
        for val in row:
            out = compute(val)
            if isinstance(out, str):
                return out
            result_row.append(out)
        result.append(result_row)

    # Return scalar if input was scalar (single element)
    if len(result) == 1 and len(result[0]) == 1:
        return result[0][0]
    return result

Online Calculator