KS_2SAMP

Overview

The KS_2SAMP function performs the two-sample Kolmogorov-Smirnov (K-S) test, a nonparametric statistical test that determines whether two independent samples were drawn from the same underlying continuous probability distribution. Named after mathematicians Andrey Kolmogorov and Nikolai Smirnov, the test is particularly valuable because it makes no assumptions about the specific form of the distribution.

The test works by comparing the empirical distribution functions (EDFs) of the two samples. For two samples with sizes n and m, the K-S statistic measures the maximum absolute difference between their cumulative distributions:

D_{n,m} = \sup_x |F_{1,n}(x) - F_{2,m}(x)|

where F_{1,n} and F_{2,m} are the empirical distribution functions of the first and second samples respectively, and \sup denotes the supremum (largest value) over all observations.

The two-sample K-S test is sensitive to differences in both location (central tendency) and shape of the distributions, making it one of the most general nonparametric methods for comparing two samples. The function supports three alternative hypotheses:

two-sided: Tests whether the distributions are identical (default)
less: Tests whether the first sample’s distribution is stochastically less than the second
greater: Tests whether the first sample’s distribution is stochastically greater than the second

This implementation uses the scipy.stats.ks_2samp function from the SciPy library. The function returns the K-S test statistic and the associated p-value. A small p-value (typically < 0.05) suggests the two samples come from different distributions. For more details on the underlying algorithm, see Hodges (1958), “The Significance Probability of the Smirnov Two-Sample Test” in Arkiv für Matematik.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=KS_2SAMP(data_one, data_two, ks_twosamp_alt)

data_one (list[list], required): First sample of observations as a 2D array.
data_two (list[list], required): Second sample of observations as a 2D array.
ks_twosamp_alt (str, optional, default: “two-sided”): Defines the alternative hypothesis for the test.

Returns (list[list]): 2D list [[statistic, p_value]], or error message string.

Example 1: Basic two-sided test comparing similar distributions

Inputs:

data_one		data_two		ks_twosamp_alt
1	2	2	3	two-sided
3	4	4	5

Excel formula:

=KS_2SAMP({1,2;3,4}, {2,3;4,5}, "two-sided")

Expected output:

Result
0.25	1

Example 2: One-sided test for stochastically less distribution

Inputs:

data_one		data_two		ks_twosamp_alt
1	2	2	3	less
3	4	4	5

Excel formula:

=KS_2SAMP({1,2;3,4}, {2,3;4,5}, "less")

Expected output:

Result
0	1

Example 3: One-sided test for stochastically greater distribution

Inputs:

data_one		data_two		ks_twosamp_alt
1	2	2	3	greater
3	4	4	5

Excel formula:

=KS_2SAMP({1,2;3,4}, {2,3;4,5}, "greater")

Expected output:

Result
0.25	0.8

Example 4: Two-sided test with shifted distributions

Inputs:

data_one		data_two		ks_twosamp_alt
10	20	15	25	two-sided
30	40	35	45

Excel formula:

=KS_2SAMP({10,20;30,40}, {15,25;35,45}, "two-sided")

Expected output:

Result
0.25	1

Python Code

Show Code

from scipy.stats import ks_2samp as scipy_ks_2samp
import math

def ks_2samp(data_one, data_two, ks_twosamp_alt='two-sided'):
    """
    Performs the two-sample Kolmogorov-Smirnov test for goodness of fit.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data_one (list[list]): First sample of observations as a 2D array.
        data_two (list[list]): Second sample of observations as a 2D array.
        ks_twosamp_alt (str, optional): Defines the alternative hypothesis for the test. Valid options: Two-sided, Less, Greater. Default is 'two-sided'.

    Returns:
        list[list]: 2D list [[statistic, p_value]], or error message string.
    """
    def to2d(x):
      return [[x]] if not isinstance(x, list) else x

    try:
      data_one = to2d(data_one)
      data_two = to2d(data_two)

      if not all(isinstance(row, list) for row in data_one):
        return "Error: data_one must be a 2D list."
      if not all(isinstance(row, list) for row in data_two):
        return "Error: data_two must be a 2D list."

      x = [float(item) for row in data_one for item in row]
      y = [float(item) for row in data_two for item in row]

      if len(x) < 2 or len(y) < 2:
        return "Error: each sample must contain at least two values."

      if ks_twosamp_alt not in ['two-sided', 'less', 'greater']:
        return "Error: ks_twosamp_alt must be 'two-sided', 'less', or 'greater'."

      result = scipy_ks_2samp(x, y, alternative=ks_twosamp_alt)
      stat = float(result.statistic)
      pvalue = float(result.pvalue)

      if math.isnan(stat) or math.isnan(pvalue) or math.isinf(stat) or math.isinf(pvalue):
        return "Error: statistic or pvalue is nan or inf."

      return [[stat, pvalue]]
    except Exception as e:
      return f"Error: {str(e)}"

Online Calculator

data_one *

First sample of observations as a 2D array.

data_two *

Second sample of observations as a 2D array.

ks_twosamp_alt

Defines the alternative hypothesis for the test.