KSTEST
Overview
The KSTEST
function performs the Kolmogorov-Smirnov test for goodness of fit between two samples or a sample and a reference distribution. This test is widely used in statistics to compare the distributions of two datasets, or to test if a sample matches a reference distribution. The test statistic is the maximum absolute difference between the empirical cumulative distribution functions (ECDFs) of the two samples:
where and are the ECDFs of the two samples. The p-value is computed based on the chosen method and alternative hypothesis. For more details, see the scipy.stats.kstest documentation .
This wrapper simplifies the function to accept only 2D lists for both samples, and exposes only the most commonly used parameters. Some advanced options and distribution callables supported by SciPy are not available in this Excel wrapper. This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=KSTEST(rvs, cdf, [alternative], [method])
rvs
(2D list, required): First sample or sample to test. Must be a 2D cell range with at least two rows.cdf
(2D list, required): Second sample or reference distribution. Must be a 2D cell range with at least two rows.alternative
(string, optional, default=‘two-sided’): Defines the null and alternative hypotheses. Allowed values:two-sided
,less
,greater
.method
(string, optional, default=‘auto’): Method for calculating the p-value. Allowed values:auto
,exact
,approx
,asymp
.
The function returns a single row (array) with four values: [statistic, pvalue, statistic_location, statistic_sign]
(all floats), or an error message (string) if the input is invalid.
Examples
Example 1: Basic Case
Inputs:
rvs | cdf | alternative | method | ||
---|---|---|---|---|---|
0.1 | 0.2 | 0.15 | 0.25 | two-sided | auto |
0.3 | 0.4 | 0.35 | 0.45 |
Excel formula:
=KSTEST({0.1,0.2;0.3,0.4}, {0.15,0.25;0.35,0.45})
Expected output:
statistic | pvalue | statistic_location | statistic_sign |
---|---|---|---|
0.25 | 0.970 | 0.15 | 1.0 |
Example 2: With Alternative ‘less’
Inputs:
rvs | cdf | alternative | method | ||
---|---|---|---|---|---|
0.1 | 0.2 | 0.15 | 0.25 | less | auto |
0.3 | 0.4 | 0.35 | 0.45 |
Excel formula:
=KSTEST({0.1,0.2;0.3,0.4}, {0.15,0.25;0.35,0.45}, "less")
Expected output:
statistic | pvalue | statistic_location | statistic_sign |
---|---|---|---|
0.25 | 0.485 | 0.15 | 1.0 |
Example 3: With Method ‘exact’
Inputs:
rvs | cdf | alternative | method | ||
---|---|---|---|---|---|
0.1 | 0.2 | 0.15 | 0.25 | two-sided | exact |
0.3 | 0.4 | 0.35 | 0.45 |
Excel formula:
=KSTEST({0.1,0.2;0.3,0.4}, {0.15,0.25;0.35,0.45}, "two-sided", "exact")
Expected output:
statistic | pvalue | statistic_location | statistic_sign |
---|---|---|---|
0.25 | 0.970 | 0.15 | 1.0 |
Example 4: All Arguments Specified
Inputs:
rvs | cdf | alternative | method | ||
---|---|---|---|---|---|
0.1 | 0.2 | 0.15 | 0.25 | greater | asymp |
0.3 | 0.4 | 0.35 | 0.45 |
Excel formula:
=KSTEST({0.1,0.2;0.3,0.4}, {0.15,0.25;0.35,0.45}, "greater", "asymp")
Expected output:
statistic | pvalue | statistic_location | statistic_sign |
---|---|---|---|
0.25 | 0.485 | 0.15 | 1.0 |
Python Code
from scipy.stats import kstest as scipy_kstest
from typing import List, Optional, Union
def kstest(rvs: List[List[float]], cdf: List[List[float]], alternative: str = 'two-sided', method: str = 'auto') -> Union[List[List[Optional[float]]], str]:
"""
Performs the Kolmogorov-Smirnov test for goodness of fit between two samples or a sample and a reference distribution.
Args:
rvs: 2D list of float values. First sample or sample to test.
cdf: 2D list of float values. Second sample or reference distribution.
alternative: Defines the null and alternative hypotheses ('two-sided', 'less', 'greater'). Default is 'two-sided'.
method: Method for calculating the p-value ('auto', 'exact', 'approx', 'asymp'). Default is 'auto'.
Returns:
2D list with [statistic, pvalue, statistic_location, statistic_sign] in a single row, or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate rvs and cdf are 2D lists with at least two rows
if not (isinstance(rvs, list) and all(isinstance(row, list) for row in rvs) and len(rvs) >= 2):
return "Invalid input: rvs must be a 2D list with at least two rows."
if not (isinstance(cdf, list) and all(isinstance(row, list) for row in cdf) and len(cdf) >= 2):
return "Invalid input: cdf must be a 2D list with at least two rows."
# Flatten rvs and cdf
try:
x = [float(item) for row in rvs for item in row]
y = [float(item) for row in cdf for item in row]
except Exception:
return "Invalid input: rvs and cdf must contain only numeric values."
if len(x) < 2 or len(y) < 2:
return "Invalid input: each sample must contain at least two values."
# Validate alternative
if alternative not in ['two-sided', 'less', 'greater']:
return "Invalid input: alternative must be 'two-sided', 'less', or 'greater'."
# Validate method
if method not in ['auto', 'exact', 'approx', 'asymp']:
return "Invalid input: method must be 'auto', 'exact', 'approx', or 'asymp'."
# Call scipy.stats.kstest
try:
result = scipy_kstest(x, y, alternative=alternative, method=method)
stat = float(result.statistic)
pvalue = float(result.pvalue)
stat_loc = float(getattr(result, 'statistic_location', None)) if hasattr(result, 'statistic_location') else None
stat_sign = float(getattr(result, 'statistic_sign', None)) if hasattr(result, 'statistic_sign') else None
except Exception as e:
return f"scipy.stats.kstest error: {e}"
# Check for nan/inf
for val in [stat, pvalue, stat_loc, stat_sign]:
if isinstance(val, float) and (val != val or val in [float('inf'), float('-inf')]):
return "Invalid result: output contains nan or inf."
return [[stat, pvalue, stat_loc, stat_sign]]