ZIPF
Overview
The ZIPF
function computes values related to the Zipf (Zeta) discrete distribution, which describes the probability of integer values following a power law. This function can return the probability mass function (PMF), cumulative distribution function (CDF), survival function (SF), inverse CDF (quantile/ICDF), inverse SF (ISF), mean, variance, standard deviation, or median for a given value.
Excel does not provide a native Zipf distribution function. The Python function in Excel provided here supports all major distribution features, including PMF, CDF, survival function, inverse CDF (quantile), inverse survival function, and distribution statistics (mean, median, variance, standard deviation).
For more details, see the scipy.stats.zipf documentation .
Usage
To use the function in Excel:
=ZIPF(k, a, [mode], [loc])
k
(float or 2D list, required): Value(s) at which to evaluate the distribution. For PMF, CDF, SF, ICDF, and ISF, this is the integer value (k >= 1).a
(float, required): Distribution parameter (a > 1).mode
(str, optional, default=“pmf”): Output type. One of"pmf"
,"cdf"
,"sf"
,"icdf"
,"isf"
,"mean"
,"var"
,"std"
, or"median"
.loc
(float, optional, default=0): Location parameter (shifts the distribution).
The function returns a scalar or 2D list of floats (for array input), or an error message (string) if the input is invalid. The output depends on the selected mode:
pmf
: Probability mass function at k.cdf
: Cumulative distribution function at k.sf
: Survival function (1 - CDF) at k.icdf
: Inverse CDF (quantile) for probability k.isf
: Inverse survival function for probability k.mean
: Mean of the distribution.var
: Variance of the distribution.std
: Standard deviation of the distribution.median
: Median of the distribution.
Examples
Example 1: PMF at k=2, a=2.5
Inputs:
k | a | mode | loc |
---|---|---|---|
2 | 2.5 | pmf | 0 |
Excel formula:
=ZIPF(2, 2.5, "pmf", 0)
Expected output:
Result |
---|
0.1318 |
Example 2: CDF at k=2, a=2.5
Inputs:
k | a | mode | loc |
---|---|---|---|
2 | 2.5 | cdf | 0 |
Excel formula:
=ZIPF(2, 2.5, "cdf", 0)
Expected output:
Result |
---|
0.8772 |
Example 3: Survival Function at k=2, a=2.5
Inputs:
k | a | mode | loc |
---|---|---|---|
2 | 2.5 | sf | 0 |
Excel formula:
=ZIPF(2, 2.5, "sf", 0)
Expected output:
Result |
---|
0.1228 |
Example 4: Inverse CDF (ICDF) for probability k=0.5, a=2.5
Inputs:
k | a | mode | loc |
---|---|---|---|
0.5 | 2.5 | icdf | 0 |
Excel formula:
=ZIPF(0.5, 2.5, "icdf", 0)
Expected output:
Result |
---|
1 |
Example 5: Mean, Variance, Std, Median
Inputs:
k | a | mode | loc |
---|---|---|---|
1 | 2.5 | mean | 0 |
1 | 2.5 | var | 0 |
1 | 2.5 | std | 0 |
1 | 2.5 | median | 0 |
Excel formulas:
=ZIPF(1, 2.5, "mean", 0)
=ZIPF(1, 2.5, "var", 0)
=ZIPF(1, 2.5, "std", 0)
=ZIPF(1, 2.5, "median", 0)
Expected outputs:
Result |
---|
1.9474 |
inf |
inf |
1 |
Python Code
from scipy.stats import zipf as scipy_zipf
def zipf(k, a, mode="pmf", loc=0):
"""
Compute Zipf distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
Args:
k: Value(s) at which to evaluate (float or 2D list).
a: Distribution parameter (float, a > 1).
mode: Output type: 'pmf', 'cdf', 'sf', 'icdf', 'isf', 'mean', 'var', 'std', or 'median'.
loc: Location parameter (float, default 0).
Returns:
Scalar or 2D list of floats, or error message (str) if invalid. Never returns inf as a float; returns 'inf' or '-inf' as a string instead.
"""
import math
# Validate a
try:
a_val = float(a)
if not (a_val > 1):
return "Invalid input: a must be > 1."
except Exception:
return "Invalid input: a must be a number."
# Validate loc
try:
loc_val = float(loc)
except Exception:
return "Invalid input: loc must be a number."
# Validate mode
valid_modes = ["pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"]
if not isinstance(mode, str) or mode not in valid_modes:
return f"Invalid input: mode must be one of {valid_modes}."
# Helper to process k (scalar or 2D list)
def process_k(val):
try:
return float(val)
except Exception:
return None
# Helper to convert inf to string
def inf_to_str(val):
if isinstance(val, float) and math.isinf(val):
return "inf" if val > 0 else "-inf"
return val
# Handle statistics
if mode == "mean":
result = scipy_zipf.mean(a_val, loc=loc_val)
return inf_to_str(result)
if mode == "var":
result = scipy_zipf.var(a_val, loc=loc_val)
return inf_to_str(result)
if mode == "std":
result = scipy_zipf.std(a_val, loc=loc_val)
return inf_to_str(result)
if mode == "median":
result = scipy_zipf.median(a_val, loc=loc_val)
return inf_to_str(result)
# PMF, CDF, SF, ICDF, ISF
def compute(val):
kval = process_k(val)
if kval is None:
return "Invalid input: k must be a number."
if mode == "pmf":
out = float(scipy_zipf.pmf(kval, a_val, loc=loc_val))
elif mode == "cdf":
out = float(scipy_zipf.cdf(kval, a_val, loc=loc_val))
elif mode == "sf":
out = float(scipy_zipf.sf(kval, a_val, loc=loc_val))
elif mode == "icdf":
out = float(scipy_zipf.ppf(kval, a_val, loc=loc_val))
elif mode == "isf":
out = float(scipy_zipf.isf(kval, a_val, loc=loc_val))
else:
return "Invalid mode."
return inf_to_str(out)
# 2D list or scalar
if isinstance(k, list):
# 2D list
if not all(isinstance(row, list) for row in k):
return "Invalid input: k must be a scalar or 2D list."
result = []
for row in k:
result_row = []
for val in row:
out = compute(val)
if isinstance(out, str) and (out.startswith("Invalid") or out == "inf" or out == "-inf"):
result_row.append(out)
else:
result_row.append(out)
result.append(result_row)
return result
else:
return compute(k)