ANDERSON_KSAMP
Overview
The ANDERSON_KSAMP
function performs the k-sample Anderson-Darling test to determine if multiple sample groups are drawn from the same (unspecified) distribution. This nonparametric test is more sensitive to differences in the tails than the Kolmogorov-Smirnov test and can handle more than two groups. It is useful for comparing distributions across several groups, especially when normality or equal variance assumptions do not hold. The test statistic is calculated as:
where is the total number of observations and is a sum involving the empirical distribution functions of the samples. For details, see the scipy.stats.anderson_ksamp documentation .
This wrapper exposes only the most commonly used parameters: the sample groups and the midrank
option. The permutation-based p-value calculation is not supported, as it requires complex configuration and is computationally intensive. The function returns the test statistic, p-value, and critical values for common significance levels. This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=ANDERSON_KSAMP(samples, [midrank])
samples
(2D list, required): Table where each column is a sample group, and each row is an observation. Must have at least two columns and two rows per column.midrank
(bool, optional, default=TRUE
): IfTRUE
, uses the midrank test (recommended for continuous and discrete data). IfFALSE
, uses the right side empirical distribution for discrete data.
The function returns a single row (array) with the following values:
- test statistic
- p-value
- critical value at 25%
- critical value at 10%
- critical value at 5%
- critical value at 2.5%
- critical value at 1%
- critical value at 0.5%
- critical value at 0.1%
If the input is invalid, an error message string is returned.
Examples
Example 1: Two Groups, Midrank (Default)
Inputs:
samples | midrank | |
---|---|---|
1.1 | 1.2 | TRUE |
2.2 | 2.1 | |
3.3 | 3.4 |
Excel formula:
=ANDERSON_KSAMP({1.1,1.2;2.2,2.1;3.3,3.4})
Expected output:
Statistic | p-value | Crit_25 | Crit_10 | Crit_5 | Crit_2.5 | Crit_1 | Crit_0.5 | Crit_0.1 |
---|---|---|---|---|---|---|---|---|
-0.940 | 0.250 | 0.325 | 1.226 | 1.961 | 2.718 | 3.752 | 4.592 | 6.546 |
Example 2: Three Groups, Midrank
Inputs:
samples | midrank | ||
---|---|---|---|
1.1 | 1.2 | 1.3 | TRUE |
2.2 | 2.1 | 2.3 | |
3.3 | 3.4 | 3.1 |
Excel formula:
=ANDERSON_KSAMP({1.1,1.2,1.3;2.2,2.1,2.3;3.3,3.4,3.1})
Expected output:
Statistic | p-value | Crit_25 | Crit_10 | Crit_5 | Crit_2.5 | Crit_1 | Crit_0.5 | Crit_0.1 |
---|---|---|---|---|---|---|---|---|
-1.306 | 0.250 | 0.449 | 1.305 | 1.943 | 2.577 | 3.416 | 4.072 | 5.564 |
Example 3: Two Groups, Right Side Empirical
Inputs:
samples | midrank | |
---|---|---|
1.1 | 1.2 | FALSE |
2.2 | 2.1 | |
3.3 | 3.4 |
Excel formula:
=ANDERSON_KSAMP({1.1,1.2;2.2,2.1;3.3,3.4}, FALSE)
Expected output:
Statistic | p-value | Crit_25 | Crit_10 | Crit_5 | Crit_2.5 | Crit_1 | Crit_0.5 | Crit_0.1 |
---|---|---|---|---|---|---|---|---|
-0.867 | 0.250 | 0.325 | 1.226 | 1.961 | 2.718 | 3.752 | 4.592 | 6.546 |
Example 4: Three Groups, Right Side Empirical
Inputs:
samples | midrank | ||
---|---|---|---|
1.1 | 1.2 | 1.3 | FALSE |
2.2 | 2.1 | 2.3 | |
3.3 | 3.4 | 3.1 |
Excel formula:
=ANDERSON_KSAMP({1.1,1.2,1.3;2.2,2.1,2.3;3.3,3.4,3.1}, FALSE)
Expected output:
Statistic | p-value | Crit_25 | Crit_10 | Crit_5 | Crit_2.5 | Crit_1 | Crit_0.5 | Crit_0.1 |
---|---|---|---|---|---|---|---|---|
-1.239 | 0.250 | 0.449 | 1.305 | 1.943 | 2.577 | 3.416 | 4.072 | 5.564 |
Python Code
from scipy.stats import anderson_ksamp as scipy_anderson_ksamp
from typing import List, Union
def anderson_ksamp(samples: List[List[float]], midrank: bool = True) -> Union[List[List[float]], str]:
"""
Performs the k-sample Anderson-Darling test to determine if samples are drawn from the same population.
Args:
samples: 2D list of float values. Each column represents a sample group.
midrank: If True, uses the midrank test (default, suitable for continuous and discrete data). If False, uses the right side empirical distribution for discrete data.
Returns:
2D list with a single row: [statistic, pvalue, critical_25, critical_10, critical_5, critical_2_5, critical_1, critical_0_5, critical_0_1], or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate samples
if not isinstance(samples, list) or len(samples) < 2:
return "Invalid input: samples must be a 2D list with at least two columns (sample groups)."
if any(not isinstance(col, list) or len(col) < 2 for col in samples):
return "Invalid input: each sample group must be a list with at least two values."
try:
# Transpose columns to rows for scipy
transposed = [list(col) for col in samples]
# Check for non-numeric values
for group in transposed:
for v in group:
if not isinstance(v, (int, float)):
return "Invalid input: all sample values must be numeric."
except Exception:
return "Invalid input: samples must be a 2D list of floats."
try:
result = scipy_anderson_ksamp(transposed, midrank=midrank)
except Exception as e:
return f"scipy.stats.anderson_ksamp error: {e}"
# Compose output row
output = [
float(result.statistic),
float(result.pvalue),
float(result.critical_values[0]),
float(result.critical_values[1]),
float(result.critical_values[2]),
float(result.critical_values[3]),
float(result.critical_values[4]),
float(result.critical_values[5]),
float(result.critical_values[6])
]
# Check for nan/inf
if any([
isinstance(x, float) and (x != x or x == float('inf') or x == float('-inf'))
for x in output
]):
return "Invalid output: statistic or critical values are not finite."
return [output]