NHYPERGEOM
Overview
The NHYPERGEOM
function computes values related to the Negative Hypergeometric distribution, a discrete probability distribution describing the number of successes (e.g., red balls) observed before a fixed number of failures (e.g., blue balls) are drawn from a finite population without replacement. This function can return the probability mass function (PMF), cumulative distribution function (CDF), survival function (SF), inverse CDF (quantile/ICDF), inverse SF (ISF), mean, variance, standard deviation, or median for a given value.
Excel does not provide a native Negative Hypergeometric function. The Python function in Excel provided here supports PMF, CDF, SF, ICDF, ISF, and distribution statistics (mean, median, variance, standard deviation).
For more details, see the scipy.stats.nhypergeom documentation .
Usage
To use the function in Excel:
=NHYPERGEOM(k, M, n, r, [mode], [loc])
k
(float or 2D list, required): Value(s) at which to evaluate the distribution. For PMF, CDF, SF, ICDF, and ISF, this is the integer value (0 ≤ k ≤ n). For statistics modes, this is ignored and can be set to 0.M
(int, required): Total number of balls (M ≥ 1).n
(int, required): Number of red balls (0 ≤ n ≤ M).r
(int, required): Number of blue balls to draw (0 ≤ r ≤ M-n).mode
(str, optional, default=“pmf”): Output type. One of"pmf"
,"cdf"
,"sf"
,"icdf"
,"isf"
,"mean"
,"var"
,"std"
, or"median"
.loc
(float, optional, default=0): Location parameter (shifts the distribution).
The function returns a scalar or 2D list of floats (for array input), or an error message (string) if the input is invalid. The output depends on the selected mode:
pmf
: Probability mass function at k.cdf
: Cumulative distribution function at k.sf
: Survival function (1 - CDF) at k.icdf
: Inverse CDF (quantile) for probability k.isf
: Inverse survival function for probability k.mean
: Mean of the distribution.var
: Variance of the distribution.std
: Standard deviation of the distribution.median
: Median of the distribution.
Examples
Example 1: PMF at k=3, M=20, n=7, r=12
Inputs:
k | M | n | r | mode | loc |
---|---|---|---|---|---|
3 | 20 | 7 | 12 | pmf | 0 |
Excel formula:
=NHYPERGEOM(3, 20, 7, 12, "pmf", 0)
Expected output:
Result |
---|
0.0235 |
Example 2: CDF at k=3, M=20, n=7, r=12
Inputs:
k | M | n | r | mode | loc |
---|---|---|---|---|---|
3 | 20 | 7 | 12 | cdf | 0 |
Excel formula:
=NHYPERGEOM(3, 20, 7, 12, "cdf", 0)
Expected output:
Result |
---|
0.0307 |
Example 3: Survival Function at k=3, M=20, n=7, r=12
Inputs:
k | M | n | r | mode | loc |
---|---|---|---|---|---|
3 | 20 | 7 | 12 | sf | 0 |
Excel formula:
=NHYPERGEOM(3, 20, 7, 12, "sf", 0)
Expected output:
Result |
---|
0.9693 |
Example 4: Inverse CDF (ICDF) for probability k=0.5, M=20, n=7, r=12
Inputs:
k | M | n | r | mode | loc |
---|---|---|---|---|---|
0.5 | 20 | 7 | 12 | icdf | 0 |
Excel formula:
=NHYPERGEOM(0.5, 20, 7, 12, "icdf", 0)
Expected output:
Result |
---|
6 |
Example 5: Mean, Variance, Std, Median
Inputs:
k | M | n | r | mode | loc |
---|---|---|---|---|---|
0 | 20 | 7 | 12 | mean | 0 |
0 | 20 | 7 | 12 | var | 0 |
0 | 20 | 7 | 12 | std | 0 |
0 | 20 | 7 | 12 | median | 0 |
Excel formulas:
=NHYPERGEOM(0, 20, 7, 12, "mean", 0)
=NHYPERGEOM(0, 20, 7, 12, "var", 0)
=NHYPERGEOM(0, 20, 7, 12, "std", 0)
=NHYPERGEOM(0, 20, 7, 12, "median", 0)
Expected outputs:
Result |
---|
6.0 |
1.2 |
1.0954 |
6.0 |
Python Code
from scipy.stats import nhypergeom as scipy_nhypergeom
def nhypergeom(k, M, n, r, mode="pmf", loc=0):
"""
Compute Negative Hypergeometric distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
Args:
k: Value(s) at which to evaluate (float or 2D list).
M: Total number of balls (int, M >= 1).
n: Number of red balls (int, 0 <= n <= M).
r: Number of blue balls to draw (int, 0 <= r <= M-n).
mode: Output type: 'pmf', 'cdf', 'sf', 'icdf', 'isf', 'mean', 'var', 'std', or 'median'.
loc: Location parameter (float, default 0).
Returns:
Scalar or 2D list of floats, or error message (str) if invalid.
"""
# Validate M
try:
M_val = int(M)
if not (M_val >= 1):
return "Invalid input: M must be >= 1."
except Exception:
return "Invalid input: M must be an integer."
# Validate n
try:
n_val = int(n)
if not (0 <= n_val <= M_val):
return "Invalid input: n must be between 0 and M."
except Exception:
return "Invalid input: n must be an integer."
# Validate r
try:
r_val = int(r)
if not (0 <= r_val <= M_val - n_val):
return "Invalid input: r must be between 0 and M-n."
except Exception:
return "Invalid input: r must be an integer."
# Validate loc
try:
loc_val = float(loc)
except Exception:
return "Invalid input: loc must be a number."
# Validate mode
valid_modes = ["pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"]
if not isinstance(mode, str) or mode not in valid_modes:
return f"Invalid input: mode must be one of {valid_modes}."
# Helper to process k (scalar or 2D list)
def process_k(val):
try:
return float(val)
except Exception:
return None
# Handle statistics
if mode == "mean":
return scipy_nhypergeom.mean(M_val, n_val, r_val, loc=loc_val)
if mode == "var":
return scipy_nhypergeom.var(M_val, n_val, r_val, loc=loc_val)
if mode == "std":
return scipy_nhypergeom.std(M_val, n_val, r_val, loc=loc_val)
if mode == "median":
return scipy_nhypergeom.median(M_val, n_val, r_val, loc=loc_val)
# PMF, CDF, SF, ICDF, ISF
def compute(val):
kval = process_k(val)
if kval is None:
return "Invalid input: k must be a number."
if mode == "pmf":
return float(scipy_nhypergeom.pmf(kval, M_val, n_val, r_val, loc=loc_val))
elif mode == "cdf":
return float(scipy_nhypergeom.cdf(kval, M_val, n_val, r_val, loc=loc_val))
elif mode == "sf":
return float(scipy_nhypergeom.sf(kval, M_val, n_val, r_val, loc=loc_val))
elif mode == "icdf":
return float(scipy_nhypergeom.ppf(kval, M_val, n_val, r_val, loc=loc_val))
elif mode == "isf":
return float(scipy_nhypergeom.isf(kval, M_val, n_val, r_val, loc=loc_val))
# 2D list or scalar
if isinstance(k, list):
# 2D list
if not all(isinstance(row, list) for row in k):
return "Invalid input: k must be a scalar or 2D list."
result = []
for row in k:
result_row = []
for val in row:
out = compute(val)
if isinstance(out, str):
return out
result_row.append(out)
result.append(result_row)
return result
else:
return compute(k)