Skip to Content

KS_2SAMP

Overview

The KS_2SAMP function performs the two-sample Kolmogorov-Smirnov test for goodness of fit, which compares the distributions of two independent samples to determine if they differ significantly. This nonparametric test is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the samples. The test statistic is the maximum absolute difference between the two cumulative distributions:

D=supxF1(x)F2(x)D = \sup_x |F_1(x) - F_2(x)|

where F1(x)F_1(x) and F2(x)F_2(x) are the empirical cumulative distribution functions of the two samples. For more details, see the scipy.stats.ks_2samp documentation.

This wrapper exposes only the most commonly used parameters: the two data samples and the alternative hypothesis. It does not support axis-based operations, NaN handling, or method selection, and always returns a single row with the test statistic and p-value. This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:

=KS_2SAMP(data_one, data_two, [alternative])
  • data_one (2D list, required): First sample of observations. Must be a 2D array with at least two rows.
  • data_two (2D list, required): Second sample of observations. Must be a 2D array with at least two rows.
  • alternative (string, optional, default="two-sided"): Defines the null and alternative hypotheses. Allowed values: "two-sided", "less", "greater".

The function returns a single-row 2D array: [statistic, pvalue]. If the input is invalid, it returns an error message (string).

Examples

Example 1: Basic Two-Sided Test

Inputs:

data_onedata_twoalternative
1.02.02.03.0two-sided
3.04.04.05.0

Excel formula:

=KS_2SAMP({1,2;3,4}, {2,3;4,5})

Expected output:

statisticpvalue
0.2501.000

Example 2: “less” Alternative

Inputs:

data_onedata_twoalternative
1.02.02.03.0less
3.04.04.05.0

Excel formula:

=KS_2SAMP({1,2;3,4}, {2,3;4,5}, "less")

Expected output:

statisticpvalue
0.0001.000

Example 3: “greater” Alternative

Inputs:

data_onedata_twoalternative
1.02.02.03.0greater
3.04.04.05.0

Excel formula:

=KS_2SAMP({1,2;3,4}, {2,3;4,5}, "greater")

Expected output:

statisticpvalue
0.2500.800

Example 4: All Arguments Specified

Inputs:

data_onedata_twoalternative
10.020.015.025.0two-sided
30.040.035.045.0

Excel formula:

=KS_2SAMP({10,20;30,40}, {15,25;35,45}, "two-sided")

Expected output:

statisticpvalue
0.2501.000

Python Code

from scipy.stats import ks_2samp as scipy_ks_2samp from typing import List, Union def ks_2samp(data_one: List[List[float]], data_two: List[List[float]], alternative: str = 'two-sided') -> Union[List[List[float]], str]: """ Performs the two-sample Kolmogorov-Smirnov test for goodness of fit. Args: data_one: 2D list of float values. First sample of observations. data_two: 2D list of float values. Second sample of observations. alternative: Defines the null and alternative hypotheses ('two-sided', 'less', 'greater'). Default is 'two-sided'. Returns: 2D list with a single row: [statistic, pvalue]. Returns an error message (str) if input is invalid. This example function is provided as-is without any representation of accuracy. """ # Validate data_one and data_two are 2D lists with at least two rows if not (isinstance(data_one, list) and all(isinstance(row, list) for row in data_one) and len(data_one) >= 2): return "Invalid input: data_one must be a 2D list with at least two rows." if not (isinstance(data_two, list) and all(isinstance(row, list) for row in data_two) and len(data_two) >= 2): return "Invalid input: data_two must be a 2D list with at least two rows." # Flatten data_one and data_two try: x = [float(item) for row in data_one for item in row] y = [float(item) for row in data_two for item in row] except Exception: return "Invalid input: data_one and data_two must contain only numeric values." if len(x) < 2 or len(y) < 2: return "Invalid input: each sample must contain at least two values." # Validate alternative if alternative not in ['two-sided', 'less', 'greater']: return "Invalid input: alternative must be 'two-sided', 'less', or 'greater'." # Call scipy.stats.ks_2samp try: result = scipy_ks_2samp(x, y, alternative=alternative) stat = float(result.statistic) pvalue = float(result.pvalue) except Exception as e: return f"scipy.stats.ks_2samp error: {e}" # Check for nan/inf if any([isinstance(val, float) and (val != val or val in [float('inf'), float('-inf')]) for val in [stat, pvalue]]): return "Invalid result: statistic or pvalue is nan or inf." return [[stat, pvalue]]

Example Workbook

Link to Workbook

Last updated on