MV_HYPERGEOM

Overview

The MV_HYPERGEOM function computes statistical properties of the multivariate hypergeometric distribution, a generalization of the standard hypergeometric distribution to populations containing more than two types of objects. This distribution models the probability of drawing specific quantities of each object type when sampling without replacement from a finite population.

The multivariate hypergeometric distribution arises naturally in urn-model problems and quality control scenarios where a sample is drawn from a heterogeneous population. For example, consider an urn containing balls of k different colors with m_1, m_2, \ldots, m_k balls of each color respectively. If n balls are drawn without replacement, the multivariate hypergeometric distribution describes the joint probability of drawing exactly x_1, x_2, \ldots, x_k balls of each color.

The probability mass function (PMF) is defined as:

P(X_1 = x_1, X_2 = x_2, \ldots, X_k = x_k) = \frac{\binom{m_1}{x_1}\binom{m_2}{x_2}\cdots\binom{m_k}{x_k}}{\binom{M}{n}}

where m_i represents the number of objects of type i in the population, M = \sum_{i=1}^{k} m_i is the total population size, and n is the sample size with the constraint that \sum_{i=1}^{k} x_i = n.

This implementation uses the SciPy library’s multivariate_hypergeom class from the scipy.stats module. The function supports multiple computational methods: pmf for the probability mass function, logpmf for the natural logarithm of the PMF (useful for numerical stability with small probabilities), mean and var for the expected value and variance of each component, cov for the full covariance matrix, and rvs for generating random samples. For additional technical details, see the SciPy multivariate_hypergeom documentation.

When the population contains only two object types, the multivariate hypergeometric distribution reduces to the univariate hypergeometric distribution. For theoretical background on urn models and this distribution, see Random Services: Multivariate Hypergeometric Distribution.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=MV_HYPERGEOM(x, m, n, mvh_method, size)
  • x (list[list], optional, default: null): Number of objects of each type drawn from the population. Required for ‘pmf’ and ‘logpmf’ methods.
  • m (list[list], optional, default: null): Number of objects of each type in the population. Single row with columns for each type.
  • n (list[list], optional, default: null): Number of samples drawn from the population. Single-element 2D list.
  • mvh_method (str, optional, default: “pmf”): Computation method to use.
  • size (int, optional, default: null): Number of random samples to draw when method is ‘rvs’.

Returns (list[list]): 2D list of results, or error message string.

Examples

Example 1: Basic PMF calculation

Inputs:

x m n mvh_method
3 2 1 10 8 6 6 pmf

Excel formula:

=MV_HYPERGEOM({3,2,1}, {10,8,6}, 6, "pmf")

Expected output:

Result
0.1498

Example 2: Log-PMF calculation

Inputs:

x m n mvh_method
3 2 1 10 8 6 6 logpmf

Excel formula:

=MV_HYPERGEOM({3,2,1}, {10,8,6}, 6, "logpmf")

Expected output:

Result
-1.8986

Example 3: Mean calculation

Inputs:

m n mvh_method
10 8 6 6 mean

Excel formula:

=MV_HYPERGEOM({10,8,6}, 6, "mean")

Expected output:

Result
2.5 2 1.5

Example 4: Variance calculation

Inputs:

m n mvh_method
10 8 6 6 var

Excel formula:

=MV_HYPERGEOM({10,8,6}, 6, "var")

Expected output:

Result
1.141 1.043 0.8804

Example 5: Covariance matrix calculation

Inputs:

m n mvh_method
10 8 6 6 cov

Excel formula:

=MV_HYPERGEOM({10,8,6}, 6, "cov")

Expected output:

Result
1.141 -0.6522 -0.4891
-0.6522 1.043 -0.3913
-0.4891 -0.3913 0.8804

Example 6: Random variates generation

Inputs:

m n mvh_method size
10 8 6 6 rvs 3

Excel formula:

=MV_HYPERGEOM({10,8,6}, 6, "rvs", 3)

Expected output:

"non-error"

Python Code

from scipy.stats import multivariate_hypergeom as scipy_multivariate_hypergeom

def mv_hypergeom(x=None, m=None, n=None, mvh_method='pmf', size=None):
    """
    Computes probability mass function, log-PMF, mean, variance, covariance, or draws random samples from a multivariate hypergeometric distribution.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multivariate_hypergeom.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list], optional): Number of objects of each type drawn from the population. Required for 'pmf' and 'logpmf' methods. Default is None.
        m (list[list], optional): Number of objects of each type in the population. Single row with columns for each type. Default is None.
        n (list[list], optional): Number of samples drawn from the population. Single-element 2D list. Default is None.
        mvh_method (str, optional): Computation method to use. Valid options: PMF, Log-PMF, Mean, Var, Cov, RVS. Default is 'pmf'.
        size (int, optional): Number of random samples to draw when method is 'rvs'. Default is None.

    Returns:
        list[list]: 2D list of results, or error message string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def is_int_like(val):
        if isinstance(val, bool):
            return False
        if isinstance(val, int):
            return True
        if isinstance(val, float) and val.is_integer():
            return True
        return False

    def to_int(val):
        return int(val)

    # Validate mvh_method
    valid_methods = {'pmf', 'logpmf', 'mean', 'var', 'cov', 'rvs'}
    if mvh_method not in valid_methods:
        return f"Invalid method: {mvh_method}. Must be one of {sorted(valid_methods)}."

    # Normalize and validate m
    m = to2d(m)
    if not isinstance(m, list) or not all(isinstance(row, list) for row in m):
        return "Invalid input: m must be a 2D list of integers."
    if len(m) != 1:
        return "Invalid input: m must be a 2D list with exactly one row."
    if not all(is_int_like(val) for val in m[0]):
        return "Invalid input: m must be a 2D list of integers."
    m_int = [to_int(val) for val in m[0]]
    if any(val < 0 for val in m_int):
        return "Invalid input: m values must be non-negative."
    if len(m_int) < 2:
        return "Invalid input: m must have at least two types in the population."

    # Normalize and validate n
    n = to2d(n)
    if not isinstance(n, list) or len(n) != 1 or len(n[0]) != 1:
        return "Invalid input: n must be a single integer value."
    if not is_int_like(n[0][0]):
        return "Invalid input: n must be an integer."
    n_val = to_int(n[0][0])
    if n_val < 0:
        return "Invalid input: n must be non-negative."
    if n_val > sum(m_int):
        return "Invalid input: n cannot exceed total population size."

    # Validate x for pmf/logpmf
    if mvh_method in {'pmf', 'logpmf'}:
        x = to2d(x)
        if not isinstance(x, list) or not all(isinstance(row, list) for row in x):
            return "Invalid input: x must be a 2D list of integers for pmf/logpmf."
        if len(x) != 1:
            return "Invalid input: x must be a 2D list with exactly one row."
        if not all(is_int_like(val) for val in x[0]):
            return "Invalid input: x must be a 2D list of integers for pmf/logpmf."
        x_int = [to_int(val) for val in x[0]]
        if any(val < 0 for val in x_int):
            return "Invalid input: x values must be non-negative."
        if len(x_int) != len(m_int):
            return "Invalid input: x must have the same number of columns as m."

    # Validate size for rvs
    if mvh_method == 'rvs' and size is not None:
        if not is_int_like(size) or to_int(size) < 1:
            return "Invalid input: size must be a positive integer."
        size = to_int(size)

    # Try to create the distribution
    try:
        dist = scipy_multivariate_hypergeom(m_int, n_val)
    except Exception as e:
        return f"scipy.multivariate_hypergeom error: {e}"

    # Compute result
    try:
        if mvh_method == 'pmf':
            result = dist.pmf(x_int)
            return [[float(result)]]
        elif mvh_method == 'logpmf':
            result = dist.logpmf(x_int)
            return [[float(result)]]
        elif mvh_method == 'mean':
            result = dist.mean()
            return [list(map(float, result))]
        elif mvh_method == 'var':
            result = dist.var()
            return [list(map(float, result))]
        elif mvh_method == 'cov':
            cov = dist.cov()
            return [list(map(float, row)) for row in cov.tolist()]
        elif mvh_method == 'rvs':
            if size is not None:
                samples = dist.rvs(size=size)
            else:
                samples = dist.rvs()
            if hasattr(samples, 'tolist'):
                samples = samples.tolist()
            # Ensure 2D output format
            if isinstance(samples[0], (int, float)):
                return [list(map(int, samples))]
            return [list(map(int, row)) for row in samples]
    except Exception as e:
        return f"scipy.multivariate_hypergeom error: {e}"
    return "Unknown error."

Online Calculator