ZIPFIAN
Overview
The ZIPFIAN function computes values from the Zipfian distribution, a discrete probability distribution that models rank-frequency relationships commonly observed in natural and social phenomena. The distribution is named after linguist George Kingsley Zipf, who studied word frequency patterns in natural language, where the most common word appears roughly twice as often as the second most common, three times as often as the third, and so on.
The Zipfian distribution assigns probabilities according to an inverse power law. The probability mass function is defined as:
f(k, a, n) = \frac{1}{H_{n,a} \cdot k^a}
where k \in \{1, 2, \ldots, n\} represents the rank, a > 0 is the shape parameter controlling the power law exponent, n \geq 1 is the number of elements in the distribution, and H_{n,a} = \sum_{i=1}^{n} \frac{1}{i^a} is the generalized harmonic number that normalizes the distribution.
This implementation uses the SciPy library’s zipfian distribution, which is part of the broader family of discrete probability distributions. Unlike the infinite Zipf (zeta) distribution, the Zipfian distribution includes a finite cutoff parameter n. As n \to \infty and a > 1, the Zipfian distribution converges to the Zipf distribution.
The Zipfian distribution has applications across diverse fields including linguistics (word frequencies), information science (internet traffic patterns), economics (income distribution), and urban studies (city populations). The shape parameter a controls how quickly probabilities decay with rank: larger values of a produce steeper declines, with probabilities concentrated on low-ranked elements, while a close to 0 approaches a uniform distribution.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=ZIPFIAN(k, a, n, zipfian_mode, loc)
k(list[list], required): Value(s) at which to evaluate the distribution. For icdf/isf, probability in [0, 1].a(float, required): Distribution shape parameter. Must be greater than 0.n(int, required): Number of elements in the distribution. Must be greater than or equal to 1.zipfian_mode(str, optional, default: “pmf”): Calculation mode to use.loc(float, optional, default: 0): Location parameter that shifts the distribution.
Returns (float): Distribution result (float), or error message string.
Examples
Example 1: Demo case 1
Inputs:
| k | a | n |
|---|---|---|
| 3 | 1.25 | 10 |
Excel formula:
=ZIPFIAN(3, 1.25, 10)
Expected output:
0.1067
Example 2: Demo case 2
Inputs:
| k | a | n | zipfian_mode |
|---|---|---|---|
| 3 | 1.25 | 10 | cdf |
Excel formula:
=ZIPFIAN(3, 1.25, 10, "cdf")
Expected output:
0.7052
Example 3: Demo case 3
Inputs:
| k | a | n | zipfian_mode | loc |
|---|---|---|---|---|
| 3 | 1.25 | 10 | sf | 0 |
Excel formula:
=ZIPFIAN(3, 1.25, 10, "sf", 0)
Expected output:
0.2948
Example 4: Demo case 4
Inputs:
| k | a | n | zipfian_mode | loc |
|---|---|---|---|---|
| 0.5 | 1.25 | 10 | icdf | 0 |
Excel formula:
=ZIPFIAN(0.5, 1.25, 10, "icdf", 0)
Expected output:
2
Python Code
import math
from scipy.stats import zipfian as scipy_zipfian
def zipfian(k, a, n, zipfian_mode='pmf', loc=0):
"""
Compute Zipfian distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zipfian.html
This example function is provided as-is without any representation of accuracy.
Args:
k (list[list]): Value(s) at which to evaluate the distribution. For icdf/isf, probability in [0, 1].
a (float): Distribution shape parameter. Must be greater than 0.
n (int): Number of elements in the distribution. Must be greater than or equal to 1.
zipfian_mode (str, optional): Calculation mode to use. Valid options: PMF, CDF, SF, ICDF, ISF, Mean, Var, Std, Median. Default is 'pmf'.
loc (float, optional): Location parameter that shifts the distribution. Default is 0.
Returns:
float: Distribution result (float), or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
# Validate a
try:
a_val = float(a)
if not (a_val > 0):
return "Invalid input: a must be > 0."
except Exception:
return "Invalid input: a must be a number."
# Validate n
try:
n_val = int(n)
if not (n_val >= 1):
return "Invalid input: n must be >= 1."
except Exception:
return "Invalid input: n must be an integer."
# Validate loc
try:
loc_val = float(loc)
except Exception:
return "Invalid input: loc must be a number."
# Validate zipfian_mode
valid_modes = {"pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"}
if not isinstance(zipfian_mode, str) or zipfian_mode not in valid_modes:
return f"Invalid input: zipfian_mode must be one of {sorted(valid_modes)}."
# Helper to process k (scalar or 2D list)
def process_k(val):
try:
return float(val)
except Exception:
return None
# Helper to convert inf to string
def inf_to_str(val):
if isinstance(val, float) and math.isinf(val):
return "inf" if val > 0 else "-inf"
return val
# Handle statistics
if zipfian_mode == "mean":
result = scipy_zipfian.mean(a_val, n_val, loc=loc_val)
return inf_to_str(float(result))
if zipfian_mode == "var":
result = scipy_zipfian.var(a_val, n_val, loc=loc_val)
return inf_to_str(float(result))
if zipfian_mode == "std":
result = scipy_zipfian.std(a_val, n_val, loc=loc_val)
return inf_to_str(float(result))
if zipfian_mode == "median":
result = scipy_zipfian.median(a_val, n_val, loc=loc_val)
return inf_to_str(float(result))
# PMF, CDF, SF, ICDF, ISF
def compute(val):
kval = process_k(val)
if kval is None:
return "Invalid input: k must be a number."
if zipfian_mode == "pmf":
out = float(scipy_zipfian.pmf(kval, a_val, n_val, loc=loc_val))
elif zipfian_mode == "cdf":
out = float(scipy_zipfian.cdf(kval, a_val, n_val, loc=loc_val))
elif zipfian_mode == "sf":
out = float(scipy_zipfian.sf(kval, a_val, n_val, loc=loc_val))
elif zipfian_mode == "icdf":
out = float(scipy_zipfian.ppf(kval, a_val, n_val, loc=loc_val))
elif zipfian_mode == "isf":
out = float(scipy_zipfian.isf(kval, a_val, n_val, loc=loc_val))
else:
return "Invalid mode."
return inf_to_str(out)
# Normalize k to 2D list
k = to2d(k)
if not all(isinstance(row, list) for row in k):
return "Invalid input: k must be a 2D list."
result = []
for row in k:
result_row = []
for val in row:
out = compute(val)
result_row.append(out)
result.append(result_row)
# Return scalar if single element
if len(result) == 1 and len(result[0]) == 1:
return result[0][0]
return result