YULESIMON
Overview
The YULESIMON function computes values for the Yule-Simon distribution, a discrete probability distribution originally developed by Udny Yule in 1925 to model species distribution across biological genera, and later extended by Nobel laureate Herbert A. Simon in 1955 to describe word frequency distributions in text corpora. The distribution exhibits power-law behavior in its tail, making it particularly useful for modeling phenomena that follow Zipf’s law, such as word frequencies, network degree distributions, and preferential attachment processes in random graphs like the Barabási–Albert model.
The probability mass function (PMF) for the Yule-Simon distribution is defined as:
f(k; \alpha) = \alpha \, B(k, \alpha + 1)
where k \geq 1 is an integer, \alpha > 0 is the shape parameter, and B is the beta function. For large values of k, the distribution approximates f(k; \alpha) \propto k^{-(\alpha+1)}, demonstrating the characteristic power-law tail that connects it to Zipf’s law.
This implementation uses the SciPy library’s yulesimon distribution from the scipy.stats module. The function can compute the PMF, cumulative distribution function (CDF), survival function (SF), and their inverses, as well as summary statistics including mean, variance, standard deviation, and median. The distribution can also be interpreted as a compound distribution where a geometric distribution’s parameter follows an exponential distribution. For more details, see the SciPy documentation and the Wikipedia article on the Yule-Simon distribution.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=YULESIMON(k, alpha, yulesimon_mode, loc)
k(list[list], required): Value(s) at which to evaluate the distribution (k >= 1 for PMF/CDF/SF; probability for ICDF/ISF).alpha(float, required): Shape parameter of the distribution (alpha > 0).yulesimon_mode(str, optional, default: “pmf”): Output type to compute (pmf, cdf, sf, icdf, isf, mean, var, std, or median).loc(float, optional, default: 0): Location parameter that shifts the distribution.
Returns (float): Distribution result (float), or error message string.
Examples
Example 1: PMF at k=3 with alpha=2.0
Inputs:
| k | alpha | yulesimon_mode | loc |
|---|---|---|---|
| 3 | 2 | pmf | 0 |
Excel formula:
=YULESIMON(3, 2, "pmf", 0)
Expected output:
0.06667
Example 2: CDF at k=3 with alpha=2.0
Inputs:
| k | alpha | yulesimon_mode | loc |
|---|---|---|---|
| 3 | 2 | cdf | 0 |
Excel formula:
=YULESIMON(3, 2, "cdf", 0)
Expected output:
0.9
Example 3: Survival function at k=3 with alpha=2.0
Inputs:
| k | alpha | yulesimon_mode | loc |
|---|---|---|---|
| 3 | 2 | sf | 0 |
Excel formula:
=YULESIMON(3, 2, "sf", 0)
Expected output:
0.1
Example 4: PMF with 2D array input
Inputs:
| k | alpha | yulesimon_mode | loc | |
|---|---|---|---|---|
| 1 | 2 | 2 | pmf | 0 |
| 3 | 4 |
Excel formula:
=YULESIMON({1,2;3,4}, 2, "pmf", 0)
Expected output:
| Result | |
|---|---|
| 0.6667 | 0.1667 |
| 0.06667 | 0.03333 |
Python Code
from scipy.stats import yulesimon as scipy_yulesimon
def yulesimon(k, alpha, yulesimon_mode='pmf', loc=0):
"""
Compute Yule-Simon distribution values using scipy.stats.yulesimon.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.yulesimon.html
This example function is provided as-is without any representation of accuracy.
Args:
k (list[list]): Value(s) at which to evaluate the distribution (k >= 1 for PMF/CDF/SF; probability for ICDF/ISF).
alpha (float): Shape parameter of the distribution (alpha > 0).
yulesimon_mode (str, optional): Output type to compute (pmf, cdf, sf, icdf, isf, mean, var, std, or median). Valid options: PMF, CDF, SF, ICDF, ISF, Mean, Variance, Std Dev, Median. Default is 'pmf'.
loc (float, optional): Location parameter that shifts the distribution. Default is 0.
Returns:
float: Distribution result (float), or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
def validate_matrix(matrix):
if not isinstance(matrix, list):
return None
for row in matrix:
if not isinstance(row, list):
return None
return matrix
def is_integer_value(val):
return abs(val - round(val)) <= 1e-9
# Validate alpha
try:
alpha_val = float(alpha)
if alpha_val <= 0:
return "Invalid input: alpha must be greater than 0."
except (TypeError, ValueError):
return "Invalid input: alpha must be a numeric value."
# Validate loc
try:
loc_val = float(loc)
except (TypeError, ValueError):
return "Invalid input: loc must be a numeric value."
# Validate yulesimon_mode
valid_modes = {"pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"}
if not isinstance(yulesimon_mode, str) or yulesimon_mode not in valid_modes:
return f"Invalid input: yulesimon_mode must be one of {sorted(valid_modes)}."
stats_modes = {
"mean": scipy_yulesimon.mean,
"var": scipy_yulesimon.var,
"std": scipy_yulesimon.std,
"median": scipy_yulesimon.median,
}
mode_functions = {
"pmf": scipy_yulesimon.pmf,
"cdf": scipy_yulesimon.cdf,
"sf": scipy_yulesimon.sf,
"icdf": scipy_yulesimon.ppf,
"isf": scipy_yulesimon.isf,
}
discrete_modes = ["pmf", "cdf", "sf"]
probability_modes = ["icdf", "isf"]
if yulesimon_mode in stats_modes:
try:
result = stats_modes[yulesimon_mode](alpha_val, loc=loc_val)
return float(result)
except Exception as exc:
return f"Error computing {yulesimon_mode}: {str(exc)}"
# Helper to process individual k values
def process_k(val):
try:
return float(val)
except (TypeError, ValueError):
return None
# Compute distribution function for a single k value
def compute(val):
kval = process_k(val)
if kval is None:
return "Invalid input: k must be a numeric value."
if yulesimon_mode in probability_modes and (kval < 0 or kval > 1):
return f"Invalid input: k must be between 0 and 1 for {yulesimon_mode} mode."
support_min = loc_val + 1
if yulesimon_mode in discrete_modes:
if kval < support_min:
return f"Invalid input: k must be at least {support_min} for {yulesimon_mode} mode."
if not is_integer_value(kval):
return "Invalid input: k must be an integer for pmf, cdf, or sf modes."
kval = int(round(kval))
try:
result = mode_functions[yulesimon_mode](kval, alpha_val, loc=loc_val)
return float(result)
except Exception as exc:
return f"Error computing {yulesimon_mode}: {str(exc)}"
# Normalize k input (Excel may pass single-element array as scalar)
k = to2d(k)
k = validate_matrix(k)
if k is None:
return "Invalid input: k must be a scalar or 2D list."
# Process 2D list
result = []
for row in k:
result_row = []
for val in row:
out = compute(val)
if isinstance(out, str) and (out.startswith("Invalid") or out.startswith("Error")):
return out
result_row.append(out)
result.append(result_row)
# Return scalar if input was single element
if len(result) == 1 and len(result[0]) == 1:
return result[0][0]
return result