ZIPFIAN

Overview

The ZIPFIAN function computes values from the Zipfian distribution, a discrete probability distribution that models rank-frequency relationships commonly observed in natural and social phenomena. The distribution is named after linguist George Kingsley Zipf, who studied word frequency patterns in natural language, where the most common word appears roughly twice as often as the second most common, three times as often as the third, and so on.

The Zipfian distribution assigns probabilities according to an inverse power law. The probability mass function is defined as:

f(k, a, n) = \frac{1}{H_{n,a} \cdot k^a}

where k \in \{1, 2, \ldots, n\} represents the rank, a > 0 is the shape parameter controlling the power law exponent, n \geq 1 is the number of elements in the distribution, and H_{n,a} = \sum_{i=1}^{n} \frac{1}{i^a} is the generalized harmonic number that normalizes the distribution.

This implementation uses the SciPy library’s zipfian distribution, which is part of the broader family of discrete probability distributions. Unlike the infinite Zipf (zeta) distribution, the Zipfian distribution includes a finite cutoff parameter n. As n \to \infty and a > 1, the Zipfian distribution converges to the Zipf distribution.

The Zipfian distribution has applications across diverse fields including linguistics (word frequencies), information science (internet traffic patterns), economics (income distribution), and urban studies (city populations). The shape parameter a controls how quickly probabilities decay with rank: larger values of a produce steeper declines, with probabilities concentrated on low-ranked elements, while a close to 0 approaches a uniform distribution.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=ZIPFIAN(k, a, n, zipfian_mode, loc)

k (list[list], required): Value(s) at which to evaluate the distribution. For icdf/isf, probability in [0, 1].
a (float, required): Distribution shape parameter. Must be greater than 0.
n (int, required): Number of elements in the distribution. Must be greater than or equal to 1.
zipfian_mode (str, optional, default: “pmf”): Calculation mode to use.
loc (float, optional, default: 0): Location parameter that shifts the distribution.

Returns (float): Distribution result (float), or error message string.

Example 1: Demo case 1

Inputs:

k	a	n
3	1.25	10

Excel formula:

=ZIPFIAN(3, 1.25, 10)

Expected output:

0.106721

Example 2: Demo case 2

Inputs:

k	a	n	zipfian_mode
3	1.25	10	cdf

Excel formula:

=ZIPFIAN(3, 1.25, 10, "cdf")

Expected output:

0.705238

Example 3: Demo case 3

Inputs:

k	a	n	zipfian_mode	loc
3	1.25	10	sf	0

Excel formula:

=ZIPFIAN(3, 1.25, 10, "sf", 0)

Expected output:

0.294762

Example 4: Demo case 4

Inputs:

k	a	n	zipfian_mode	loc
0.5	1.25	10	icdf	0

Excel formula:

=ZIPFIAN(0.5, 1.25, 10, "icdf", 0)

Expected output:

2

Python Code

Show Code

import math
from scipy.stats import zipfian as scipy_zipfian

def zipfian(k, a, n, zipfian_mode='pmf', loc=0):
    """
    Compute Zipfian distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zipfian.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        k (list[list]): Value(s) at which to evaluate the distribution. For icdf/isf, probability in [0, 1].
        a (float): Distribution shape parameter. Must be greater than 0.
        n (int): Number of elements in the distribution. Must be greater than or equal to 1.
        zipfian_mode (str, optional): Calculation mode to use. Valid options: PMF, CDF, SF, ICDF, ISF, Mean, Var, Std, Median. Default is 'pmf'.
        loc (float, optional): Location parameter that shifts the distribution. Default is 0.

    Returns:
        float: Distribution result (float), or error message string.
    """
    try:
      def to2d(x):
        return [[x]] if not isinstance(x, list) else x

      # Validate a
      try:
        a_val = float(a)
        if not (a_val > 0):
          return "Error: Invalid input: a must be > 0."
      except Exception:
        return "Error: Invalid input: a must be a number."
      # Validate n
      try:
        n_val = int(n)
        if not (n_val >= 1):
          return "Error: Invalid input: n must be >= 1."
      except Exception:
        return "Error: Invalid input: n must be an integer."
      # Validate loc
      try:
        loc_val = float(loc)
      except Exception:
        return "Error: Invalid input: loc must be a number."
      # Validate zipfian_mode
      valid_modes = {"pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"}
      if not isinstance(zipfian_mode, str) or zipfian_mode not in valid_modes:
        return f"Error: Invalid input: zipfian_mode must be one of {sorted(valid_modes)}."
      # Helper to process k (scalar or 2D list)
      def process_k(val):
        try:
          return float(val)
        except Exception:
          return None
      # Helper to convert inf to string
      def inf_to_str(val):
        if isinstance(val, float) and math.isinf(val):
          return "inf" if val > 0 else "-inf"
        return val
      # Handle statistics
      if zipfian_mode == "mean":
        result = scipy_zipfian.mean(a_val, n_val, loc=loc_val)
        return inf_to_str(float(result))
      if zipfian_mode == "var":
        result = scipy_zipfian.var(a_val, n_val, loc=loc_val)
        return inf_to_str(float(result))
      if zipfian_mode == "std":
        result = scipy_zipfian.std(a_val, n_val, loc=loc_val)
        return inf_to_str(float(result))
      if zipfian_mode == "median":
        result = scipy_zipfian.median(a_val, n_val, loc=loc_val)
        return inf_to_str(float(result))
      # PMF, CDF, SF, ICDF, ISF
      def compute(val):
        kval = process_k(val)
        if kval is None:
          return "Error: Invalid input: k must be a number."
        if zipfian_mode == "pmf":
          out = float(scipy_zipfian.pmf(kval, a_val, n_val, loc=loc_val))
        elif zipfian_mode == "cdf":
          out = float(scipy_zipfian.cdf(kval, a_val, n_val, loc=loc_val))
        elif zipfian_mode == "sf":
          out = float(scipy_zipfian.sf(kval, a_val, n_val, loc=loc_val))
        elif zipfian_mode == "icdf":
          out = float(scipy_zipfian.ppf(kval, a_val, n_val, loc=loc_val))
        elif zipfian_mode == "isf":
          out = float(scipy_zipfian.isf(kval, a_val, n_val, loc=loc_val))
        else:
          return "Error: Invalid mode."
        return inf_to_str(out)

      # Normalize k to 2D list
      k = to2d(k)
      if not all(isinstance(row, list) for row in k):
        return "Error: Invalid input: k must be a 2D list."

      result = []
      for row in k:
        result_row = []
        for val in row:
          out = compute(val)
          result_row.append(out)
        result.append(result_row)

      # Return scalar if single element
      if len(result) == 1 and len(result[0]) == 1:
        return result[0][0]
      return result
    except Exception as e:
      return f"Error: {str(e)}"

Online Calculator

k *

Value(s) at which to evaluate the distribution. For icdf/isf, probability in [0, 1].

a *

Distribution shape parameter. Must be greater than 0.

n *

Number of elements in the distribution. Must be greater than or equal to 1.

zipfian_mode

Calculation mode to use.

loc

Location parameter that shifts the distribution.