Skip to Content

MULTIVARIATE_HYPERGEOM

Overview

The MULTIVARIATE_HYPERGEOM function computes probabilities, log-probabilities, mean, variance, covariance, or random samples for the multivariate hypergeometric distribution. This distribution generalizes the classic hypergeometric distribution to more than two categories, allowing you to model draws from a population with multiple types. The probability mass function (PMF) is given by:

P(X1=x1,,Xk=xk)=i=1k(mixi)(Mn)P(X_1 = x_1, \ldots, X_k = x_k) = \frac{\prod_{i=1}^k \binom{m_i}{x_i}}{\binom{M}{n}}

where mim_i is the number of objects of type ii in the population, xix_i is the number drawn of type ii, MM is the total population size, and nn is the total number of draws. For more details, see the scipy.stats.multivariate_hypergeom documentation.

This wrapper exposes only the most commonly used parameters and methods, omitting random state/reproducibility options. All population and sample size arguments must be passed as 2D or 1D lists compatible with Excel array syntax. This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:

=MULTIVARIATE_HYPERGEOM(x, m, n, [method], [size])
  • x (2D list, required for ‘pmf’ and ‘logpmf’): Number of objects of each type drawn from the population. One row, columns equal to number of types.
  • m (2D list, required): Number of objects of each type in the population. One row, columns equal to number of types.
  • n (list, required): Number of samples drawn from the population. Single-element list.
  • method (str, optional, default=‘pmf’): Which method to compute: pmf, logpmf, mean, var, cov, or rvs.
  • size (int, optional): Number of samples to draw if method is rvs.

The function returns a 2D list of results for each input, or an error message (string) if the input is invalid. For pmf and logpmf, the result is a single value. For mean and var, the result is a row vector. For cov, the result is a matrix. For rvs, the result is a table of samples.

Examples

Example 1: Basic PMF Calculation

Inputs:

xmn
32110866

Excel formula:

=MULTIVARIATE_HYPERGEOM({3,2,1}, {10,8,6}, {6})

Expected output:

Result
0.150

Example 2: Log-PMF Calculation

Inputs:

xmn
32110866

Excel formula:

=MULTIVARIATE_HYPERGEOM({3,2,1}, {10,8,6}, {6}, "logpmf")

Expected output:

Result
-1.899

Example 3: Mean Calculation

Inputs:

mn
10866

Excel formula:

=MULTIVARIATE_HYPERGEOM(, {10,8,6}, {6}, "mean")

Expected output:

Mean
2.5002.0001.500

Example 4: Covariance Matrix Calculation

Inputs:

mn
10866

Excel formula:

=MULTIVARIATE_HYPERGEOM(, {10,8,6}, {6}, "cov")

Expected output:

Covariance
1.141-0.652-0.489
-0.6521.043-0.391
-0.489-0.3910.880

Python Code

from scipy.stats import multivariate_hypergeom as scipy_multivariate_hypergeom from typing import List, Optional, Union def multivariate_hypergeom( x: Optional[List[List[int]]] = None, m: Optional[List[List[int]]] = None, n: Optional[List[int]] = None, method: str = 'pmf', size: Optional[int] = None ) -> Union[List[List[Optional[float]]], str]: """ Computes probability mass function, log-PMF, mean, variance, covariance, or draws random samples from a multivariate hypergeometric distribution. Args: x: 2D list of int values. Number of objects of each type drawn from the population. Required for 'pmf' and 'logpmf'. m: 2D list of int values. Number of objects of each type in the population. n: List of int values. Number of samples drawn from the population. method: Which method to compute (str): 'pmf', 'logpmf', 'mean', 'var', 'cov', 'rvs'. Default is 'pmf'. size: Number of samples to draw if method is 'rvs'. Optional. Returns: 2D list of results for each input, or an error message (str) if input is invalid. This example function is provided as-is without any representation of accuracy. """ # Validate method valid_methods = {'pmf', 'logpmf', 'mean', 'var', 'cov', 'rvs'} if method not in valid_methods: return f"Invalid method: {method}. Must be one of {sorted(valid_methods)}." # Validate m if not isinstance(m, list) or not all(isinstance(row, list) and all(isinstance(val, int) for val in row) for row in m): return "Invalid input: m must be a 2D list of integers." if len(m) != 1: return "Invalid input: m must be a 2D list with exactly one row (Excel passes population as [[...]])." if len(m[0]) < 2: return "Invalid input: m must be a 2D list with at least two columns." # Validate n if not isinstance(n, list) or not all(isinstance(val, int) for val in n): return "Invalid input: n must be a list of integers." if len(n) != 1: return "Invalid input: n must be a list with exactly one element (Excel passes sample size as [n])." # Validate x for pmf/logpmf if method in {'pmf', 'logpmf'}: if not isinstance(x, list) or not all(isinstance(row, list) and all(isinstance(val, int) for val in row) for row in x): return "Invalid input: x must be a 2D list of integers for pmf/logpmf." if len(x) != 1: return "Invalid input: x must be a 2D list with exactly one row." if len(x[0]) != len(m[0]): return "Invalid input: x must have the same number of columns as m." # Validate size for rvs if method == 'rvs' and size is not None: if not isinstance(size, int) or size < 1: return "Invalid input: size must be a positive integer." # Try to create the distribution try: m_arr = m[0] n_val = n[0] dist = scipy_multivariate_hypergeom(m_arr, n_val) except Exception as e: return f"scipy.multivariate_hypergeom error: {e}" # Compute result try: if method == 'pmf': result = dist.pmf(x[0]) return [[float(result)]] elif method == 'logpmf': result = dist.logpmf(x[0]) return [[float(result)]] elif method == 'mean': result = dist.mean() return [list(map(float, result))] elif method == 'var': result = dist.var() return [list(map(float, result))] elif method == 'cov': cov = dist.cov() # cov is a 2D numpy array return [list(map(float, row)) for row in cov.tolist()] elif method == 'rvs': if size is not None: samples = dist.rvs(size=size) # samples: shape (size, k) if hasattr(samples, 'tolist'): samples = samples.tolist() return [list(map(int, row)) for row in samples] else: sample = dist.rvs() if hasattr(sample, 'tolist'): sample = sample.tolist() return [list(map(int, sample))] except Exception as e: return f"scipy.multivariate_hypergeom error: {e}" return "Unknown error."

Example Workbook

Link to Workbook

Last updated on