EPPS_SINGLE_2SAMP

Overview

The EPPS_SINGLE_2SAMP function performs the Epps-Singleton (ES) test, a two-sample hypothesis test that determines whether two samples are drawn from the same underlying probability distribution. Unlike the more common Kolmogorov-Smirnov test which uses the empirical distribution function, the ES test is based on the empirical characteristic function, making it applicable to both continuous and discrete distributions.

The test was introduced by T. W. Epps and K. J. Singleton in their 1986 paper “An omnibus test for the two-sample problem using the empirical characteristic function” (Journal of Statistical Computation and Simulation, 26, pp. 177–203). The characteristic function of a random variable is the Fourier transform of its probability distribution, and evaluating the empirical characteristic function at specific points provides a way to compare distributions without assuming continuity.

The test statistic is computed by evaluating the empirical characteristic functions of both samples at a set of points t (defaulting to 0.4 and 0.8), then measuring the discrepancy between them. The p-value is derived from the asymptotic chi-squared distribution of the test statistic. When both sample sizes are below 25, a small sample correction is automatically applied.

Key advantages of the ES test include:

  • No continuity assumption: Works for discrete, mixed, and continuous distributions
  • Higher statistical power: Often outperforms the Kolmogorov-Smirnov test, especially for detecting differences in tails or multimodal distributions
  • Recommended for samples ≥ 25 observations: For smaller continuous samples, consider anderson_ksamp

This implementation uses scipy.stats.epps_singleton_2samp from the SciPy library. The source code is available on GitHub.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=EPPS_SINGLE_2SAMP(x, y, t)
  • x (list[list], required): First sample as a 2D array. Must contain at least five numeric values.
  • y (list[list], required): Second sample as a 2D array. Must contain at least five numeric values.
  • t (list[list], optional, default: null): Points where the empirical characteristic function is evaluated. Defaults to [0.4, 0.8].

Returns (list[list]): A 2D list with one row containing [statistic, pvalue]. str: An error message if input is invalid.

Examples

Example 1: Basic two-sample comparison

Inputs:

x y
1 2
2 3
3 4
4 5
5 6

Excel formula:

=EPPS_SINGLE_2SAMP({1;2;3;4;5}, {2;3;4;5;6})

Expected output:

Result
0.9139 0.9226

Example 2: Custom evaluation points

Inputs:

x y t
1 2 0.5 1
2 3
3 4
4 5
5 6

Excel formula:

=EPPS_SINGLE_2SAMP({1;2;3;4;5}, {2;3;4;5;6}, {0.5,1})

Expected output:

Result
0.9095 0.9232

Example 3: Scaled sample values

Inputs:

x y
10 15
20 25
30 35
40 45
50 55

Excel formula:

=EPPS_SINGLE_2SAMP({10;20;30;40;50}, {15;25;35;45;55})

Expected output:

Result
0.4025 0.9823

Example 4: Larger sample sizes

Inputs:

x y
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11

Excel formula:

=EPPS_SINGLE_2SAMP({1;2;3;4;5;6;7;8;9;10}, {2;3;4;5;6;7;8;9;10;11})

Expected output:

Result
0.9591 0.9159

Python Code

import math
from scipy.stats import epps_singleton_2samp as scipy_epps_singleton_2samp

def epps_single_2samp(x, y, t=None):
    """
    Compute the Epps-Singleton test statistic and p-value for two samples.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.epps_singleton_2samp.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list]): First sample as a 2D array. Must contain at least five numeric values.
        y (list[list]): Second sample as a 2D array. Must contain at least five numeric values.
        t (list[list], optional): Points where the empirical characteristic function is evaluated. Defaults to [0.4, 0.8]. Default is None.

    Returns:
        list[list]: A 2D list with one row containing [statistic, pvalue]. str: An error message if input is invalid.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    x = to2d(x)
    y = to2d(y)

    if not isinstance(x, list) or not all(isinstance(row, list) for row in x):
        return "Invalid input: x must be a 2D list."
    if not isinstance(y, list) or not all(isinstance(row, list) for row in y):
        return "Invalid input: y must be a 2D list."

    try:
        x_flat = [float(item) for row in x for item in row]
        y_flat = [float(item) for row in y for item in row]
    except (TypeError, ValueError):
        return "Invalid input: x and y must contain only numeric values."

    if len(x_flat) < 5:
        return "Invalid input: x must contain at least five values."
    if len(y_flat) < 5:
        return "Invalid input: y must contain at least five values."

    if t is not None:
        t = to2d(t)
        if not isinstance(t, list) or not all(isinstance(row, list) for row in t):
            return "Invalid input: t must be a 2D list."
        try:
            t_flat = [float(item) for row in t for item in row]
        except (TypeError, ValueError):
            return "Invalid input: t must contain only numeric values."
        if len(t_flat) == 0:
            return "Invalid input: t must contain at least one value."
    else:
        t_flat = [0.4, 0.8]

    try:
        result = scipy_epps_singleton_2samp(x_flat, y_flat, t=t_flat)
        stat = float(result.statistic)
        pvalue = float(result.pvalue)
    except Exception as e:
        return f"scipy.stats.epps_singleton_2samp error: {e}"

    if math.isnan(stat) or math.isinf(stat) or math.isnan(pvalue) or math.isinf(pvalue):
        return "Invalid result: statistic or pvalue is nan or inf."

    return [[stat, pvalue]]

Online Calculator