MULTIVARIATE_HYPERGEOM
Overview
The MULTIVARIATE_HYPERGEOM
function computes probabilities, log-probabilities, mean, variance, covariance, or random samples for the multivariate hypergeometric distribution. This distribution generalizes the classic hypergeometric distribution to more than two categories, allowing you to model draws from a population with multiple types. The probability mass function (PMF) is given by:
where is the number of objects of type in the population, is the number drawn of type , is the total population size, and is the total number of draws. For more details, see the scipy.stats.multivariate_hypergeom documentation .
This wrapper exposes only the most commonly used parameters and methods, omitting random state/reproducibility options. All population and sample size arguments must be passed as 2D or 1D lists compatible with Excel array syntax. This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=MULTIVARIATE_HYPERGEOM(x, m, n, [method], [size])
x
(2D list, required for ‘pmf’ and ‘logpmf’): Number of objects of each type drawn from the population. One row, columns equal to number of types.m
(2D list, required): Number of objects of each type in the population. One row, columns equal to number of types.n
(list, required): Number of samples drawn from the population. Single-element list.method
(str, optional, default=‘pmf’): Which method to compute:pmf
,logpmf
,mean
,var
,cov
, orrvs
.size
(int, optional): Number of samples to draw if method isrvs
.
The function returns a 2D list of results for each input, or an error message (string) if the input is invalid. For pmf
and logpmf
, the result is a single value. For mean
and var
, the result is a row vector. For cov
, the result is a matrix. For rvs
, the result is a table of samples.
Examples
Example 1: Basic PMF Calculation
Inputs:
x | m | n | ||||
---|---|---|---|---|---|---|
3 | 2 | 1 | 10 | 8 | 6 | 6 |
Excel formula:
=MULTIVARIATE_HYPERGEOM({3,2,1}, {10,8,6}, {6})
Expected output:
Result |
---|
0.150 |
Example 2: Log-PMF Calculation
Inputs:
x | m | n | ||||
---|---|---|---|---|---|---|
3 | 2 | 1 | 10 | 8 | 6 | 6 |
Excel formula:
=MULTIVARIATE_HYPERGEOM({3,2,1}, {10,8,6}, {6}, "logpmf")
Expected output:
Result |
---|
-1.899 |
Example 3: Mean Calculation
Inputs:
m | n | ||
---|---|---|---|
10 | 8 | 6 | 6 |
Excel formula:
=MULTIVARIATE_HYPERGEOM(, {10,8,6}, {6}, "mean")
Expected output:
Mean | ||
---|---|---|
2.500 | 2.000 | 1.500 |
Example 4: Covariance Matrix Calculation
Inputs:
m | n | ||
---|---|---|---|
10 | 8 | 6 | 6 |
Excel formula:
=MULTIVARIATE_HYPERGEOM(, {10,8,6}, {6}, "cov")
Expected output:
Covariance | ||
---|---|---|
1.141 | -0.652 | -0.489 |
-0.652 | 1.043 | -0.391 |
-0.489 | -0.391 | 0.880 |
Python Code
from scipy.stats import multivariate_hypergeom as scipy_multivariate_hypergeom
from typing import List, Optional, Union
def multivariate_hypergeom(
x: Optional[List[List[int]]] = None,
m: Optional[List[List[int]]] = None,
n: Optional[List[int]] = None,
method: str = 'pmf',
size: Optional[int] = None
) -> Union[List[List[Optional[float]]], str]:
"""
Computes probability mass function, log-PMF, mean, variance, covariance, or draws random samples from a multivariate hypergeometric distribution.
Args:
x: 2D list of int values. Number of objects of each type drawn from the population. Required for 'pmf' and 'logpmf'.
m: 2D list of int values. Number of objects of each type in the population.
n: List of int values. Number of samples drawn from the population.
method: Which method to compute (str): 'pmf', 'logpmf', 'mean', 'var', 'cov', 'rvs'. Default is 'pmf'.
size: Number of samples to draw if method is 'rvs'. Optional.
Returns:
2D list of results for each input, or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate method
valid_methods = {'pmf', 'logpmf', 'mean', 'var', 'cov', 'rvs'}
if method not in valid_methods:
return f"Invalid method: {method}. Must be one of {sorted(valid_methods)}."
# Validate m
if not isinstance(m, list) or not all(isinstance(row, list) and all(isinstance(val, int) for val in row) for row in m):
return "Invalid input: m must be a 2D list of integers."
if len(m) != 1:
return "Invalid input: m must be a 2D list with exactly one row (Excel passes population as [[...]])."
if len(m[0]) < 2:
return "Invalid input: m must be a 2D list with at least two columns."
# Validate n
if not isinstance(n, list) or not all(isinstance(val, int) for val in n):
return "Invalid input: n must be a list of integers."
if len(n) != 1:
return "Invalid input: n must be a list with exactly one element (Excel passes sample size as [n])."
# Validate x for pmf/logpmf
if method in {'pmf', 'logpmf'}:
if not isinstance(x, list) or not all(isinstance(row, list) and all(isinstance(val, int) for val in row) for row in x):
return "Invalid input: x must be a 2D list of integers for pmf/logpmf."
if len(x) != 1:
return "Invalid input: x must be a 2D list with exactly one row."
if len(x[0]) != len(m[0]):
return "Invalid input: x must have the same number of columns as m."
# Validate size for rvs
if method == 'rvs' and size is not None:
if not isinstance(size, int) or size < 1:
return "Invalid input: size must be a positive integer."
# Try to create the distribution
try:
m_arr = m[0]
n_val = n[0]
dist = scipy_multivariate_hypergeom(m_arr, n_val)
except Exception as e:
return f"scipy.multivariate_hypergeom error: {e}"
# Compute result
try:
if method == 'pmf':
result = dist.pmf(x[0])
return [[float(result)]]
elif method == 'logpmf':
result = dist.logpmf(x[0])
return [[float(result)]]
elif method == 'mean':
result = dist.mean()
return [list(map(float, result))]
elif method == 'var':
result = dist.var()
return [list(map(float, result))]
elif method == 'cov':
cov = dist.cov()
# cov is a 2D numpy array
return [list(map(float, row)) for row in cov.tolist()]
elif method == 'rvs':
if size is not None:
samples = dist.rvs(size=size)
# samples: shape (size, k)
if hasattr(samples, 'tolist'):
samples = samples.tolist()
return [list(map(int, row)) for row in samples]
else:
sample = dist.rvs()
if hasattr(sample, 'tolist'):
sample = sample.tolist()
return [list(map(int, sample))]
except Exception as e:
return f"scipy.multivariate_hypergeom error: {e}"
return "Unknown error."