KSTEST

Overview

The KSTEST function performs the Kolmogorov-Smirnov test for goodness of fit between two samples or a sample and a reference distribution. This test is widely used in statistics to compare the distributions of two datasets, or to test if a sample matches a reference distribution. The test statistic is the maximum absolute difference between the empirical cumulative distribution functions (ECDFs) of the two samples:

D = \sup_x |F_1(x) - F_2(x)|

where $F_1(x)$ and $F_2(x)$ are the ECDFs of the two samples. The p-value is computed based on the chosen method and alternative hypothesis. For more details, see the scipy.stats.kstest documentation .

This wrapper simplifies the function to accept only 2D lists for both samples, and exposes only the most commonly used parameters. Some advanced options and distribution callables supported by SciPy are not available in this Excel wrapper. This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:


=KSTEST(rvs, cdf, [alternative], [method])

rvs (2D list, required): First sample or sample to test. Must be a 2D cell range with at least two rows.
cdf (2D list, required): Second sample or reference distribution. Must be a 2D cell range with at least two rows.
alternative (string, optional, default=‘two-sided’): Defines the null and alternative hypotheses. Allowed values: two-sided, less, greater.
method (string, optional, default=‘auto’): Method for calculating the p-value. Allowed values: auto, exact, approx, asymp.

The function returns a single row (array) with four values: [statistic, pvalue, statistic_location, statistic_sign] (all floats), or an error message (string) if the input is invalid.

Examples

Example 1: Basic Case

Inputs:

rvs		cdf		alternative	method
0.1	0.2	0.15	0.25	two-sided	auto
0.3	0.4	0.35	0.45

Excel formula:


=KSTEST({0.1,0.2;0.3,0.4}, {0.15,0.25;0.35,0.45})

Expected output:

statistic	pvalue	statistic_location	statistic_sign
0.25	0.970	0.15	1.0

Example 2: With Alternative ‘less’

Inputs:

rvs		cdf		alternative	method
0.1	0.2	0.15	0.25	less	auto
0.3	0.4	0.35	0.45

Excel formula:


=KSTEST({0.1,0.2;0.3,0.4}, {0.15,0.25;0.35,0.45}, "less")

Expected output:

statistic	pvalue	statistic_location	statistic_sign
0.25	0.485	0.15	1.0

Example 3: With Method ‘exact’

Inputs:

rvs		cdf		alternative	method
0.1	0.2	0.15	0.25	two-sided	exact
0.3	0.4	0.35	0.45

Excel formula:


=KSTEST({0.1,0.2;0.3,0.4}, {0.15,0.25;0.35,0.45}, "two-sided", "exact")

Expected output:

statistic	pvalue	statistic_location	statistic_sign
0.25	0.970	0.15	1.0

Example 4: All Arguments Specified

Inputs:

rvs		cdf		alternative	method
0.1	0.2	0.15	0.25	greater	asymp
0.3	0.4	0.35	0.45

Excel formula:


=KSTEST({0.1,0.2;0.3,0.4}, {0.15,0.25;0.35,0.45}, "greater", "asymp")

Expected output:

statistic	pvalue	statistic_location	statistic_sign
0.25	0.485	0.15	1.0

Python Code


from scipy.stats import kstest as scipy_kstest
from typing import List, Optional, Union
 
def kstest(rvs: List[List[float]], cdf: List[List[float]], alternative: str = 'two-sided', method: str = 'auto') -> Union[List[List[Optional[float]]], str]:
    """
    Performs the Kolmogorov-Smirnov test for goodness of fit between two samples or a sample and a reference distribution.
 
    Args:
        rvs: 2D list of float values. First sample or sample to test.
        cdf: 2D list of float values. Second sample or reference distribution.
        alternative: Defines the null and alternative hypotheses ('two-sided', 'less', 'greater'). Default is 'two-sided'.
        method: Method for calculating the p-value ('auto', 'exact', 'approx', 'asymp'). Default is 'auto'.
 
    Returns:
        2D list with [statistic, pvalue, statistic_location, statistic_sign] in a single row, or an error message (str) if input is invalid.
 
    This example function is provided as-is without any representation of accuracy.
    """
    # Validate rvs and cdf are 2D lists with at least two rows
    if not (isinstance(rvs, list) and all(isinstance(row, list) for row in rvs) and len(rvs) >= 2):
        return "Invalid input: rvs must be a 2D list with at least two rows."
    if not (isinstance(cdf, list) and all(isinstance(row, list) for row in cdf) and len(cdf) >= 2):
        return "Invalid input: cdf must be a 2D list with at least two rows."
    # Flatten rvs and cdf
    try:
        x = [float(item) for row in rvs for item in row]
        y = [float(item) for row in cdf for item in row]
    except Exception:
        return "Invalid input: rvs and cdf must contain only numeric values."
    if len(x) < 2 or len(y) < 2:
        return "Invalid input: each sample must contain at least two values."
    # Validate alternative
    if alternative not in ['two-sided', 'less', 'greater']:
        return "Invalid input: alternative must be 'two-sided', 'less', or 'greater'."
    # Validate method
    if method not in ['auto', 'exact', 'approx', 'asymp']:
        return "Invalid input: method must be 'auto', 'exact', 'approx', or 'asymp'."
    # Call scipy.stats.kstest
    try:
        result = scipy_kstest(x, y, alternative=alternative, method=method)
        stat = float(result.statistic)
        pvalue = float(result.pvalue)
        stat_loc = float(getattr(result, 'statistic_location', None)) if hasattr(result, 'statistic_location') else None
        stat_sign = float(getattr(result, 'statistic_sign', None)) if hasattr(result, 'statistic_sign') else None
    except Exception as e:
        return f"scipy.stats.kstest error: {e}"
    # Check for nan/inf
    for val in [stat, pvalue, stat_loc, stat_sign]:
        if isinstance(val, float) and (val != val or val in [float('inf'), float('-inf')]):
            return "Invalid result: output contains nan or inf."
    return [[stat, pvalue, stat_loc, stat_sign]]

Example Workbook

Link to Workbook