ANDERSON_KSAMP

Overview

The ANDERSON_KSAMP function performs the k-sample Anderson-Darling test to determine if multiple sample groups are drawn from the same (unspecified) distribution. This nonparametric test is more sensitive to differences in the tails than the Kolmogorov-Smirnov test and can handle more than two groups. It is useful for comparing distributions across several groups, especially when normality or equal variance assumptions do not hold. The test statistic is calculated as:

A^2 = -n - S

where $n$ is the total number of observations and $S$ is a sum involving the empirical distribution functions of the samples. For details, see the scipy.stats.anderson_ksamp documentation .

This wrapper exposes only the most commonly used parameters: the sample groups and the midrank option. The permutation-based p-value calculation is not supported, as it requires complex configuration and is computationally intensive. The function returns the test statistic, p-value, and critical values for common significance levels. This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:


=ANDERSON_KSAMP(samples, [midrank])

samples (2D list, required): Table where each column is a sample group, and each row is an observation. Must have at least two columns and two rows per column.
midrank (bool, optional, default=TRUE): If TRUE, uses the midrank test (recommended for continuous and discrete data). If FALSE, uses the right side empirical distribution for discrete data.

The function returns a single row (array) with the following values:

test statistic
p-value
critical value at 25%
critical value at 10%
critical value at 5%
critical value at 2.5%
critical value at 1%
critical value at 0.5%
critical value at 0.1%

If the input is invalid, an error message string is returned.

Examples

Example 1: Two Groups, Midrank (Default)

Inputs:

samples		midrank
1.1	1.2	TRUE
2.2	2.1
3.3	3.4

Excel formula:


=ANDERSON_KSAMP({1.1,1.2;2.2,2.1;3.3,3.4})

Expected output:

Statistic	p-value	Crit_25	Crit_10	Crit_5	Crit_2.5	Crit_1	Crit_0.5	Crit_0.1
-0.940	0.250	0.325	1.226	1.961	2.718	3.752	4.592	6.546

Example 2: Three Groups, Midrank

Inputs:

samples			midrank
1.1	1.2	1.3	TRUE
2.2	2.1	2.3
3.3	3.4	3.1

Excel formula:


=ANDERSON_KSAMP({1.1,1.2,1.3;2.2,2.1,2.3;3.3,3.4,3.1})

Expected output:

Statistic	p-value	Crit_25	Crit_10	Crit_5	Crit_2.5	Crit_1	Crit_0.5	Crit_0.1
-1.306	0.250	0.449	1.305	1.943	2.577	3.416	4.072	5.564

Example 3: Two Groups, Right Side Empirical

Inputs:

samples		midrank
1.1	1.2	FALSE
2.2	2.1
3.3	3.4

Excel formula:


=ANDERSON_KSAMP({1.1,1.2;2.2,2.1;3.3,3.4}, FALSE)

Expected output:

Statistic	p-value	Crit_25	Crit_10	Crit_5	Crit_2.5	Crit_1	Crit_0.5	Crit_0.1
-0.867	0.250	0.325	1.226	1.961	2.718	3.752	4.592	6.546

Example 4: Three Groups, Right Side Empirical

Inputs:

samples			midrank
1.1	1.2	1.3	FALSE
2.2	2.1	2.3
3.3	3.4	3.1

Excel formula:


=ANDERSON_KSAMP({1.1,1.2,1.3;2.2,2.1,2.3;3.3,3.4,3.1}, FALSE)

Expected output:

Statistic	p-value	Crit_25	Crit_10	Crit_5	Crit_2.5	Crit_1	Crit_0.5	Crit_0.1
-1.239	0.250	0.449	1.305	1.943	2.577	3.416	4.072	5.564

Python Code


from scipy.stats import anderson_ksamp as scipy_anderson_ksamp
from typing import List, Union
 
def anderson_ksamp(samples: List[List[float]], midrank: bool = True) -> Union[List[List[float]], str]:
    """
    Performs the k-sample Anderson-Darling test to determine if samples are drawn from the same population.
 
    Args:
        samples: 2D list of float values. Each column represents a sample group.
        midrank: If True, uses the midrank test (default, suitable for continuous and discrete data). If False, uses the right side empirical distribution for discrete data.
 
    Returns:
        2D list with a single row: [statistic, pvalue, critical_25, critical_10, critical_5, critical_2_5, critical_1, critical_0_5, critical_0_1], or an error message (str) if input is invalid.
 
    This example function is provided as-is without any representation of accuracy.
    """
    # Validate samples
    if not isinstance(samples, list) or len(samples) < 2:
        return "Invalid input: samples must be a 2D list with at least two columns (sample groups)."
    if any(not isinstance(col, list) or len(col) < 2 for col in samples):
        return "Invalid input: each sample group must be a list with at least two values."
    try:
        # Transpose columns to rows for scipy
        transposed = [list(col) for col in samples]
        # Check for non-numeric values
        for group in transposed:
            for v in group:
                if not isinstance(v, (int, float)):
                    return "Invalid input: all sample values must be numeric."
    except Exception:
        return "Invalid input: samples must be a 2D list of floats."
    try:
        result = scipy_anderson_ksamp(transposed, midrank=midrank)
    except Exception as e:
        return f"scipy.stats.anderson_ksamp error: {e}"
    # Compose output row
    output = [
        float(result.statistic),
        float(result.pvalue),
        float(result.critical_values[0]),
        float(result.critical_values[1]),
        float(result.critical_values[2]),
        float(result.critical_values[3]),
        float(result.critical_values[4]),
        float(result.critical_values[5]),
        float(result.critical_values[6])
    ]
    # Check for nan/inf
    if any([
        isinstance(x, float) and (x != x or x == float('inf') or x == float('-inf'))
        for x in output
    ]):
        return "Invalid output: statistic or critical values are not finite."
    return [output]

Example Workbook

Link to Workbook