EPPS_SINGLETON_2SAMP
Overview
The EPPS_SINGLETON_2SAMP
function performs the Epps-Singleton test to compare whether two independent samples come from the same distribution, using empirical characteristic functions. This test is more general than the Kolmogorov-Smirnov or t-test, and is suitable for both discrete and continuous data, especially when sample sizes are at least 25. The test statistic is computed as:
where and are the empirical characteristic functions of samples and evaluated at points , and , are the sample sizes. For more details, see the scipy.stats.epps_singleton_2samp documentation .
This wrapper exposes only the most commonly used parameters: x
, y
, and optionally t
(points for evaluation). Parameters related to axis, NaN handling, and broadcasting are omitted for Excel compatibility. This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=EPPS_SINGLETON_2SAMP(x, y, [t])
x
(2D list, required): First sample, as a column or matrix. Must have at least five rows.y
(2D list, required): Second sample, as a column or matrix. Must have at least five rows.t
(2D list, optional, default=[[0.4, 0.8]]
): Points where the empirical characteristic function is evaluated.
The function returns a single-row 2D array: [statistic, pvalue]
(both floats), or an error message (string) if the input is invalid.
Examples
Example 1: Basic Case
Inputs:
x | y |
---|---|
1.0 | 2.0 |
2.0 | 3.0 |
3.0 | 4.0 |
4.0 | 5.0 |
5.0 | 6.0 |
Excel formula:
=EPPS_SINGLETON_2SAMP({1;2;3;4;5}, {2;3;4;5;6})
Expected output:
statistic | pvalue |
---|---|
0.914 | 0.923 |
Example 2: With Custom t
Inputs:
x | y | t | |
---|---|---|---|
1.0 | 2.0 | 0.5 | 1.0 |
2.0 | 3.0 | ||
3.0 | 4.0 | ||
4.0 | 5.0 | ||
5.0 | 6.0 |
Excel formula:
=EPPS_SINGLETON_2SAMP({1;2;3;4;5}, {2;3;4;5;6}, {0.5,1.0})
Expected output:
statistic | pvalue |
---|---|
0.909 | 0.923 |
Example 3: Different Samples
Inputs:
x | y |
---|---|
10.0 | 15.0 |
20.0 | 25.0 |
30.0 | 35.0 |
40.0 | 45.0 |
50.0 | 55.0 |
Excel formula:
=EPPS_SINGLETON_2SAMP({10;20;30;40;50}, {15;25;35;45;55})
Expected output:
statistic | pvalue |
---|---|
0.402 | 0.982 |
Example 4: Larger Samples
Inputs:
x | y |
---|---|
1.0 | 2.0 |
2.0 | 3.0 |
3.0 | 4.0 |
4.0 | 5.0 |
5.0 | 6.0 |
6.0 | 7.0 |
7.0 | 8.0 |
8.0 | 9.0 |
9.0 | 10.0 |
10.0 | 11.0 |
Excel formula:
=EPPS_SINGLETON_2SAMP({1;2;3;4;5;6;7;8;9;10}, {2;3;4;5;6;7;8;9;10;11})
Expected output:
statistic | pvalue |
---|---|
0.959 | 0.916 |
Python Code
from scipy.stats import epps_singleton_2samp as scipy_epps_singleton_2samp
from typing import List, Optional, Union
def epps_singleton_2samp(x: List[List[float]], y: List[List[float]], t: Optional[List[List[float]]] = None) -> Union[List[List[float]], str]:
"""
Computes the Epps-Singleton test statistic and p-value for two samples.
Args:
x: 2D list of float values. First sample, must have at least five observations.
y: 2D list of float values. Second sample, must have at least five observations.
t: Optional 2D list of float values. Points where the empirical characteristic function is evaluated. Default is [[0.4, 0.8]].
Returns:
2D list with one row: [statistic, pvalue], or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate x and y are 2D lists with at least five rows
if not (isinstance(x, list) and all(isinstance(row, list) for row in x) and len(x) >= 5):
return "Invalid input: x must be a 2D list with at least five rows."
if not (isinstance(y, list) and all(isinstance(row, list) for row in y) and len(y) >= 5):
return "Invalid input: y must be a 2D list with at least five rows."
# Flatten x and y
try:
x_flat = [float(item) for row in x for item in row]
y_flat = [float(item) for row in y for item in row]
except Exception:
return "Invalid input: x and y must contain only numeric values."
if len(x_flat) < 5 or len(y_flat) < 5:
return "Invalid input: each sample must contain at least five values."
# Validate t
if t is not None:
if not (isinstance(t, list) and all(isinstance(row, list) for row in t)):
return "Invalid input: t must be a 2D list of floats."
try:
t_flat = [float(item) for row in t for item in row]
except Exception:
return "Invalid input: t must contain only numeric values."
if len(t_flat) == 0:
return "Invalid input: t must contain at least one value."
else:
t_flat = [0.4, 0.8]
# Call scipy.stats.epps_singleton_2samp
try:
result = scipy_epps_singleton_2samp(x_flat, y_flat, t=t_flat)
stat = float(result.statistic)
pvalue = float(result.pvalue)
except Exception as e:
return f"scipy.stats.epps_singleton_2samp error: {e}"
# Check for nan/inf
if any([isinstance(val, float) and (val != val or val in [float('inf'), float('-inf')]) for val in [stat, pvalue]]):
return "Invalid result: statistic or pvalue is nan or inf."
return [[stat, pvalue]]