BETABINOM
Overview
The BETABINOM function computes values for the beta-binomial distribution, a discrete probability distribution that models the number of successes in a sequence of independent Bernoulli trials where the success probability itself varies according to a beta distribution. This makes it a natural choice for modeling overdispersed count data where the probability of success is uncertain or varies across observations.
The beta-binomial distribution can be understood as a binomial distribution where the success probability p is not fixed but follows a beta distribution with shape parameters \alpha (a) and \beta (b). This hierarchical structure captures extra variability beyond what the standard binomial model allows. Common applications include modeling the number of defective items in quality control, insurance claim counts, and biological data such as litter sizes.
The probability mass function (PMF) is defined as:
f(k) = \binom{n}{k} \frac{B(k + a, n - k + b)}{B(a, b)}
where k \in \{0, 1, \ldots, n\}, n \geq 0 is the number of trials, a > 0 and b > 0 are the beta distribution shape parameters, and B(a, b) is the beta function. The mean and variance of the distribution are:
\mu = \frac{na}{a + b}, \quad \sigma^2 = \frac{nab(a + b + n)}{(a + b)^2(a + b + 1)}
This implementation uses SciPy’s betabinom module from scipy.stats. The function supports multiple computation modes: PMF, CDF, survival function (SF), inverse CDF (ICDF/quantile function), inverse survival function (ISF), and summary statistics (mean, variance, standard deviation, median). For more background on the distribution, see the Wikipedia article on the beta-binomial distribution.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=BETABINOM(k, n, a, b, betabinom_mode, loc)
k(float, required): Value at which to evaluate the distribution. For PMF/CDF/SF this is the number of successes (0 <= k <= n). For ICDF/ISF this is a probability (0 <= k <= 1).n(int, required): Number of trials (n >= 0).a(float, required): Alpha parameter of the beta distribution (a > 0).b(float, required): Beta parameter of the beta distribution (b > 0).betabinom_mode(str, optional, default: “pmf”): Output type to compute.loc(float, optional, default: 0): Location parameter to shift the distribution.
Returns (float): Distribution result (float), or error message string.
Examples
Example 1: PMF at k=2 with n=5 trials
Inputs:
| k | n | a | b | betabinom_mode | loc |
|---|---|---|---|---|---|
| 2 | 5 | 2.3 | 0.63 | pmf | 0 |
Excel formula:
=BETABINOM(2, 5, 2.3, 0.63, "pmf", 0)
Expected output:
0.08787
Example 2: CDF at k=2 with n=5 trials
Inputs:
| k | n | a | b | betabinom_mode | loc |
|---|---|---|---|---|---|
| 2 | 5 | 2.3 | 0.63 | cdf | 0 |
Excel formula:
=BETABINOM(2, 5, 2.3, 0.63, "cdf", 0)
Expected output:
0.1557
Example 3: Survival function at k=2 with n=5 trials
Inputs:
| k | n | a | b | betabinom_mode | loc |
|---|---|---|---|---|---|
| 2 | 5 | 2.3 | 0.63 | sf | 0 |
Excel formula:
=BETABINOM(2, 5, 2.3, 0.63, "sf", 0)
Expected output:
0.8443
Example 4: Inverse CDF for probability 0.5
Inputs:
| k | n | a | b | betabinom_mode | loc |
|---|---|---|---|---|---|
| 0.5 | 5 | 2.3 | 0.63 | icdf | 0 |
Excel formula:
=BETABINOM(0.5, 5, 2.3, 0.63, "icdf", 0)
Expected output:
4
Python Code
from scipy.stats import betabinom as scipy_betabinom
def betabinom(k, n, a, b, betabinom_mode='pmf', loc=0):
"""
Compute Beta-binomial distribution values from scipy.stats.betabinom.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.betabinom.html
This example function is provided as-is without any representation of accuracy.
Args:
k (float): Value at which to evaluate the distribution. For PMF/CDF/SF this is the number of successes (0 <= k <= n). For ICDF/ISF this is a probability (0 <= k <= 1).
n (int): Number of trials (n >= 0).
a (float): Alpha parameter of the beta distribution (a > 0).
b (float): Beta parameter of the beta distribution (b > 0).
betabinom_mode (str, optional): Output type to compute. Valid options: pmf, cdf, sf, icdf, isf, mean, var, std, median. Default is 'pmf'.
loc (float, optional): Location parameter to shift the distribution. Default is 0.
Returns:
float: Distribution result (float), or error message string.
"""
# Validate inputs
try:
n_val = int(n)
a_val = float(a)
b_val = float(b)
loc_val = float(loc)
except (ValueError, TypeError):
return "Invalid input: n must be integer, a, b, and loc must be numbers."
if n_val < 0:
return "Invalid input: n must be >= 0."
if a_val <= 0 or b_val <= 0:
return "Invalid input: a and b must be > 0."
valid_modes = {"pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"}
if betabinom_mode not in valid_modes:
return f"Invalid input: betabinom_mode must be one of {sorted(list(valid_modes))}."
# Handle statistics (independent of k)
if betabinom_mode in {"mean", "var", "std", "median"}:
if betabinom_mode == "mean":
return float(scipy_betabinom.mean(n_val, a_val, b_val, loc=loc_val))
elif betabinom_mode == "var":
return float(scipy_betabinom.var(n_val, a_val, b_val, loc=loc_val))
elif betabinom_mode == "std":
return float(scipy_betabinom.std(n_val, a_val, b_val, loc=loc_val))
elif betabinom_mode == "median":
return float(scipy_betabinom.median(n_val, a_val, b_val, loc=loc_val))
# Helper for scalar computation
def compute_scalar(val):
try:
kval = float(val)
except (ValueError, TypeError):
return "Invalid input: k must be a number."
if betabinom_mode == "pmf":
return float(scipy_betabinom.pmf(kval, n_val, a_val, b_val, loc=loc_val))
elif betabinom_mode == "cdf":
return float(scipy_betabinom.cdf(kval, n_val, a_val, b_val, loc=loc_val))
elif betabinom_mode == "sf":
return float(scipy_betabinom.sf(kval, n_val, a_val, b_val, loc=loc_val))
elif betabinom_mode == "icdf":
return float(scipy_betabinom.ppf(kval, n_val, a_val, b_val, loc=loc_val))
elif betabinom_mode == "isf":
return float(scipy_betabinom.isf(kval, n_val, a_val, b_val, loc=loc_val))
return "Unknown error"
# Handle k (scalar or 2D list)
if isinstance(k, list):
result = []
for row in k:
if not isinstance(row, list):
return "Invalid input: k must be a scalar or 2D list."
result_row = []
for val in row:
out = compute_scalar(val)
if isinstance(out, str):
return out
result_row.append(out)
result.append(result_row)
return result
else:
return compute_scalar(k)