YULESIMON
Overview
The YULESIMON
function computes values related to the Yule-Simon distribution, a discrete probability distribution used in modeling phenomena with power-law behavior, such as word frequencies and city sizes. This function can return the probability mass function (PMF), cumulative distribution function (CDF), survival function (SF), inverse CDF (quantile/ICDF), inverse SF (ISF), mean, variance, standard deviation, or median for a given value.
Excel does not provide a native Yule-Simon function. The Python function in Excel provided here supports PMF, CDF, SF, ICDF, ISF, and distribution statistics (mean, median, variance, standard deviation).
For more details, see the scipy.stats.yulesimon documentation .
Usage
To use the function in Excel:
=YULESIMON(k, alpha, [mode], [loc])
k
(float or 2D list, required): Value(s) at which to evaluate the distribution. For PMF, CDF, SF, ICDF, and ISF, this is the integer value (k >= 1). For statistics modes, this is ignored and can be set to 1.alpha
(float, required): Distribution parameter (alpha > 0).mode
(str, optional, default=“pmf”): Output type. One of"pmf"
,"cdf"
,"sf"
,"icdf"
,"isf"
,"mean"
,"var"
,"std"
, or"median"
.loc
(float, optional, default=0): Location parameter (shifts the distribution).
The function returns a scalar or 2D list of floats (for array input), or an error message (string) if the input is invalid. The output depends on the selected mode:
pmf
: Probability mass function at k.cdf
: Cumulative distribution function at k.sf
: Survival function (1 - CDF) at k.icdf
: Inverse CDF (quantile) for probability k.isf
: Inverse survival function for probability k.mean
: Mean of the distribution.var
: Variance of the distribution.std
: Standard deviation of the distribution.median
: Median of the distribution.
Examples
Example 1: PMF at k=3, alpha=2.0
Inputs:
k | alpha | mode | loc |
---|---|---|---|
3 | 2.0 | pmf | 0 |
Excel formula:
=YULESIMON(3, 2.0, "pmf", 0)
Expected output:
Result |
---|
0.0667 |
Example 2: CDF at k=3, alpha=2.0
Inputs:
k | alpha | mode | loc |
---|---|---|---|
3 | 2.0 | cdf | 0 |
Excel formula:
=YULESIMON(3, 2.0, "cdf", 0)
Expected output:
Result |
---|
0.9 |
Example 3: Survival Function at k=3, alpha=2.0
Inputs:
k | alpha | mode | loc |
---|---|---|---|
3 | 2.0 | sf | 0 |
Excel formula:
=YULESIMON(3, 2.0, "sf", 0)
Expected output:
Result |
---|
0.1 |
Example 4: Inverse CDF (ICDF) for probability k=0.5, alpha=2.0
Inputs:
k | alpha | mode | loc |
---|---|---|---|
0.5 | 2.0 | icdf | 0 |
Excel formula:
=YULESIMON(0.5, 2.0, "icdf", 0)
Expected output:
Result |
---|
1 |
Example 5: Mean, Variance, Std, Median
Inputs:
k | alpha | mode | loc |
---|---|---|---|
1 | 2.0 | mean | 0 |
1 | 2.0 | var | 0 |
1 | 2.0 | std | 0 |
1 | 2.0 | median | 0 |
Excel formulas:
=YULESIMON(1, 2.0, "mean", 0)
=YULESIMON(1, 2.0, "var", 0)
=YULESIMON(1, 2.0, "std", 0)
=YULESIMON(1, 2.0, "median", 0)
Expected outputs:
Result |
---|
2.0 |
inf |
inf |
1.0 |
Python Code
from scipy.stats import yulesimon as scipy_yulesimon
def yulesimon(k, alpha, mode="pmf", loc=0):
"""
Compute Yule-Simon distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
Args:
k: Value(s) at which to evaluate (float or 2D list).
alpha: Distribution parameter (float, alpha > 0).
mode: Output type: 'pmf', 'cdf', 'sf', 'icdf', 'isf', 'mean', 'var', 'std', or 'median'.
loc: Location parameter (float, default 0).
Returns:
Scalar or 2D list of floats, or error message (str) if invalid.
"""
# Validate alpha
try:
alpha_val = float(alpha)
if not (alpha_val > 0):
return "Invalid input: alpha must be > 0."
except Exception:
return "Invalid input: alpha must be a number."
# Validate loc
try:
loc_val = float(loc)
except Exception:
return "Invalid input: loc must be a number."
# Validate mode
valid_modes = ["pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"]
if not isinstance(mode, str) or mode not in valid_modes:
return f"Invalid input: mode must be one of {valid_modes}."
# Helper to process k (scalar or 2D list)
def process_k(val):
try:
return float(val)
except Exception:
return None
# Handle statistics
if mode == "mean":
result = scipy_yulesimon.mean(alpha_val, loc=loc_val)
if isinstance(result, float):
if result == float('inf'):
return "inf"
if result == float('-inf'):
return "-inf"
return result
if mode == "var":
result = scipy_yulesimon.var(alpha_val, loc=loc_val)
if isinstance(result, float):
if result == float('inf'):
return "inf"
if result == float('-inf'):
return "-inf"
return result
if mode == "std":
result = scipy_yulesimon.std(alpha_val, loc=loc_val)
if isinstance(result, float):
if result == float('inf'):
return "inf"
if result == float('-inf'):
return "-inf"
return result
if mode == "median":
result = scipy_yulesimon.median(alpha_val, loc=loc_val)
if isinstance(result, float):
if result == float('inf'):
return "inf"
if result == float('-inf'):
return "-inf"
return result
# PMF, CDF, SF, ICDF, ISF
def compute(val):
kval = process_k(val)
if kval is None:
return "Invalid input: k must be a number."
if mode == "pmf":
result = float(scipy_yulesimon.pmf(kval, alpha_val, loc=loc_val))
elif mode == "cdf":
result = float(scipy_yulesimon.cdf(kval, alpha_val, loc=loc_val))
elif mode == "sf":
result = float(scipy_yulesimon.sf(kval, alpha_val, loc=loc_val))
elif mode == "icdf":
result = float(scipy_yulesimon.ppf(kval, alpha_val, loc=loc_val))
elif mode == "isf":
result = float(scipy_yulesimon.isf(kval, alpha_val, loc=loc_val))
else:
return "Invalid mode."
if isinstance(result, float):
if result == float('inf'):
return "inf"
if result == float('-inf'):
return "-inf"
return result
# 2D list or scalar
if isinstance(k, list):
# 2D list
if not all(isinstance(row, list) for row in k):
return "Invalid input: k must be a scalar or 2D list."
result = []
for row in k:
result_row = []
for val in row:
out = compute(val)
if isinstance(out, str):
return out
result_row.append(out)
result.append(result_row)
return result
else:
return compute(k)