KS_2SAMP
Overview
The KS_2SAMP
function performs the two-sample Kolmogorov-Smirnov test for goodness of fit, which compares the distributions of two independent samples to determine if they differ significantly. This nonparametric test is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the samples. The test statistic is the maximum absolute difference between the two cumulative distributions:
where and are the empirical cumulative distribution functions of the two samples. For more details, see the scipy.stats.ks_2samp documentation .
This wrapper exposes only the most commonly used parameters: the two data samples and the alternative hypothesis. It does not support axis-based operations, NaN handling, or method selection, and always returns a single row with the test statistic and p-value. This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=KS_2SAMP(data_one, data_two, [alternative])
data_one
(2D list, required): First sample of observations. Must be a 2D array with at least two rows.data_two
(2D list, required): Second sample of observations. Must be a 2D array with at least two rows.alternative
(string, optional, default="two-sided"
): Defines the null and alternative hypotheses. Allowed values:"two-sided"
,"less"
,"greater"
.
The function returns a single-row 2D array: [statistic, pvalue]
. If the input is invalid, it returns an error message (string).
Examples
Example 1: Basic Two-Sided Test
Inputs:
data_one | data_two | alternative | ||
---|---|---|---|---|
1.0 | 2.0 | 2.0 | 3.0 | two-sided |
3.0 | 4.0 | 4.0 | 5.0 |
Excel formula:
=KS_2SAMP({1,2;3,4}, {2,3;4,5})
Expected output:
statistic | pvalue |
---|---|
0.250 | 1.000 |
Example 2: “less” Alternative
Inputs:
data_one | data_two | alternative | ||
---|---|---|---|---|
1.0 | 2.0 | 2.0 | 3.0 | less |
3.0 | 4.0 | 4.0 | 5.0 |
Excel formula:
=KS_2SAMP({1,2;3,4}, {2,3;4,5}, "less")
Expected output:
statistic | pvalue |
---|---|
0.000 | 1.000 |
Example 3: “greater” Alternative
Inputs:
data_one | data_two | alternative | ||
---|---|---|---|---|
1.0 | 2.0 | 2.0 | 3.0 | greater |
3.0 | 4.0 | 4.0 | 5.0 |
Excel formula:
=KS_2SAMP({1,2;3,4}, {2,3;4,5}, "greater")
Expected output:
statistic | pvalue |
---|---|
0.250 | 0.800 |
Example 4: All Arguments Specified
Inputs:
data_one | data_two | alternative | ||
---|---|---|---|---|
10.0 | 20.0 | 15.0 | 25.0 | two-sided |
30.0 | 40.0 | 35.0 | 45.0 |
Excel formula:
=KS_2SAMP({10,20;30,40}, {15,25;35,45}, "two-sided")
Expected output:
statistic | pvalue |
---|---|
0.250 | 1.000 |
Python Code
from scipy.stats import ks_2samp as scipy_ks_2samp
from typing import List, Union
def ks_2samp(data_one: List[List[float]], data_two: List[List[float]], alternative: str = 'two-sided') -> Union[List[List[float]], str]:
"""
Performs the two-sample Kolmogorov-Smirnov test for goodness of fit.
Args:
data_one: 2D list of float values. First sample of observations.
data_two: 2D list of float values. Second sample of observations.
alternative: Defines the null and alternative hypotheses ('two-sided', 'less', 'greater'). Default is 'two-sided'.
Returns:
2D list with a single row: [statistic, pvalue]. Returns an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate data_one and data_two are 2D lists with at least two rows
if not (isinstance(data_one, list) and all(isinstance(row, list) for row in data_one) and len(data_one) >= 2):
return "Invalid input: data_one must be a 2D list with at least two rows."
if not (isinstance(data_two, list) and all(isinstance(row, list) for row in data_two) and len(data_two) >= 2):
return "Invalid input: data_two must be a 2D list with at least two rows."
# Flatten data_one and data_two
try:
x = [float(item) for row in data_one for item in row]
y = [float(item) for row in data_two for item in row]
except Exception:
return "Invalid input: data_one and data_two must contain only numeric values."
if len(x) < 2 or len(y) < 2:
return "Invalid input: each sample must contain at least two values."
# Validate alternative
if alternative not in ['two-sided', 'less', 'greater']:
return "Invalid input: alternative must be 'two-sided', 'less', or 'greater'."
# Call scipy.stats.ks_2samp
try:
result = scipy_ks_2samp(x, y, alternative=alternative)
stat = float(result.statistic)
pvalue = float(result.pvalue)
except Exception as e:
return f"scipy.stats.ks_2samp error: {e}"
# Check for nan/inf
if any([isinstance(val, float) and (val != val or val in [float('inf'), float('-inf')]) for val in [stat, pvalue]]):
return "Invalid result: statistic or pvalue is nan or inf."
return [[stat, pvalue]]