EPPS_SINGLETON_2SAMP

Overview

The EPPS_SINGLETON_2SAMP function performs the Epps-Singleton test to compare whether two independent samples come from the same distribution, using empirical characteristic functions. This test is more general than the Kolmogorov-Smirnov or t-test, and is suitable for both discrete and continuous data, especially when sample sizes are at least 25. The test statistic is computed as:

ES = n \cdot m \cdot \sum_{j=1}^k \left| \phi_x(t_j) - \phi_y(t_j) \right|^2

where $\phi_x(t)$ and $\phi_y(t)$ are the empirical characteristic functions of samples $x$ and $y$ evaluated at points $t_j$ , and $n$ , $m$ are the sample sizes. For more details, see the scipy.stats.epps_singleton_2samp documentation .

This wrapper exposes only the most commonly used parameters: x, y, and optionally t (points for evaluation). Parameters related to axis, NaN handling, and broadcasting are omitted for Excel compatibility. This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:


=EPPS_SINGLETON_2SAMP(x, y, [t])

x (2D list, required): First sample, as a column or matrix. Must have at least five rows.
y (2D list, required): Second sample, as a column or matrix. Must have at least five rows.
t (2D list, optional, default=[[0.4, 0.8]]): Points where the empirical characteristic function is evaluated.

The function returns a single-row 2D array: [statistic, pvalue] (both floats), or an error message (string) if the input is invalid.

Examples

Example 1: Basic Case

Inputs:

x	y
1.0	2.0
2.0	3.0
3.0	4.0
4.0	5.0
5.0	6.0

Excel formula:


=EPPS_SINGLETON_2SAMP({1;2;3;4;5}, {2;3;4;5;6})

Expected output:

statistic	pvalue
0.914	0.923

Example 2: With Custom t

Inputs:

x	y	t
1.0	2.0	0.5	1.0
2.0	3.0
3.0	4.0
4.0	5.0
5.0	6.0

Excel formula:


=EPPS_SINGLETON_2SAMP({1;2;3;4;5}, {2;3;4;5;6}, {0.5,1.0})

Expected output:

statistic	pvalue
0.909	0.923

Example 3: Different Samples

Inputs:

x	y
10.0	15.0
20.0	25.0
30.0	35.0
40.0	45.0
50.0	55.0

Excel formula:


=EPPS_SINGLETON_2SAMP({10;20;30;40;50}, {15;25;35;45;55})

Expected output:

statistic	pvalue
0.402	0.982

Example 4: Larger Samples

Inputs:

x	y
1.0	2.0
2.0	3.0
3.0	4.0
4.0	5.0
5.0	6.0
6.0	7.0
7.0	8.0
8.0	9.0
9.0	10.0
10.0	11.0

Excel formula:


=EPPS_SINGLETON_2SAMP({1;2;3;4;5;6;7;8;9;10}, {2;3;4;5;6;7;8;9;10;11})

Expected output:

statistic	pvalue
0.959	0.916

Python Code


from scipy.stats import epps_singleton_2samp as scipy_epps_singleton_2samp
from typing import List, Optional, Union
 
def epps_singleton_2samp(x: List[List[float]], y: List[List[float]], t: Optional[List[List[float]]] = None) -> Union[List[List[float]], str]:
    """
    Computes the Epps-Singleton test statistic and p-value for two samples.
 
    Args:
        x: 2D list of float values. First sample, must have at least five observations.
        y: 2D list of float values. Second sample, must have at least five observations.
        t: Optional 2D list of float values. Points where the empirical characteristic function is evaluated. Default is [[0.4, 0.8]].
 
    Returns:
        2D list with one row: [statistic, pvalue], or an error message (str) if input is invalid.
 
    This example function is provided as-is without any representation of accuracy.
    """
    # Validate x and y are 2D lists with at least five rows
    if not (isinstance(x, list) and all(isinstance(row, list) for row in x) and len(x) >= 5):
        return "Invalid input: x must be a 2D list with at least five rows."
    if not (isinstance(y, list) and all(isinstance(row, list) for row in y) and len(y) >= 5):
        return "Invalid input: y must be a 2D list with at least five rows."
    # Flatten x and y
    try:
        x_flat = [float(item) for row in x for item in row]
        y_flat = [float(item) for row in y for item in row]
    except Exception:
        return "Invalid input: x and y must contain only numeric values."
    if len(x_flat) < 5 or len(y_flat) < 5:
        return "Invalid input: each sample must contain at least five values."
    # Validate t
    if t is not None:
        if not (isinstance(t, list) and all(isinstance(row, list) for row in t)):
            return "Invalid input: t must be a 2D list of floats."
        try:
            t_flat = [float(item) for row in t for item in row]
        except Exception:
            return "Invalid input: t must contain only numeric values."
        if len(t_flat) == 0:
            return "Invalid input: t must contain at least one value."
    else:
        t_flat = [0.4, 0.8]
    # Call scipy.stats.epps_singleton_2samp
    try:
        result = scipy_epps_singleton_2samp(x_flat, y_flat, t=t_flat)
        stat = float(result.statistic)
        pvalue = float(result.pvalue)
    except Exception as e:
        return f"scipy.stats.epps_singleton_2samp error: {e}"
    # Check for nan/inf
    if any([isinstance(val, float) and (val != val or val in [float('inf'), float('-inf')]) for val in [stat, pvalue]]):
        return "Invalid result: statistic or pvalue is nan or inf."
    return [[stat, pvalue]]

Example Workbook

Link to Workbook