Skip to Content

ANDERSON_KSAMP

Overview

The ANDERSON_KSAMP function performs the k-sample Anderson-Darling test to determine if multiple sample groups are drawn from the same (unspecified) distribution. This nonparametric test is more sensitive to differences in the tails than the Kolmogorov-Smirnov test and can handle more than two groups. It is useful for comparing distributions across several groups, especially when normality or equal variance assumptions do not hold. The test statistic is calculated as:

A2=nSA^2 = -n - S

where nn is the total number of observations and SS is a sum involving the empirical distribution functions of the samples. For details, see the scipy.stats.anderson_ksamp documentation.

This wrapper exposes only the most commonly used parameters: the sample groups and the midrank option. The permutation-based p-value calculation is not supported, as it requires complex configuration and is computationally intensive. The function returns the test statistic, p-value, and critical values for common significance levels. This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:

=ANDERSON_KSAMP(samples, [midrank])
  • samples (2D list, required): Table where each column is a sample group, and each row is an observation. Must have at least two columns and two rows per column.
  • midrank (bool, optional, default=TRUE): If TRUE, uses the midrank test (recommended for continuous and discrete data). If FALSE, uses the right side empirical distribution for discrete data.

The function returns a single row (array) with the following values:

  • test statistic
  • p-value
  • critical value at 25%
  • critical value at 10%
  • critical value at 5%
  • critical value at 2.5%
  • critical value at 1%
  • critical value at 0.5%
  • critical value at 0.1%

If the input is invalid, an error message string is returned.

Examples

Example 1: Two Groups, Midrank (Default)

Inputs:

samplesmidrank
1.11.2TRUE
2.22.1
3.33.4

Excel formula:

=ANDERSON_KSAMP({1.1,1.2;2.2,2.1;3.3,3.4})

Expected output:

Statisticp-valueCrit_25Crit_10Crit_5Crit_2.5Crit_1Crit_0.5Crit_0.1
-0.9400.2500.3251.2261.9612.7183.7524.5926.546

Example 2: Three Groups, Midrank

Inputs:

samplesmidrank
1.11.21.3TRUE
2.22.12.3
3.33.43.1

Excel formula:

=ANDERSON_KSAMP({1.1,1.2,1.3;2.2,2.1,2.3;3.3,3.4,3.1})

Expected output:

Statisticp-valueCrit_25Crit_10Crit_5Crit_2.5Crit_1Crit_0.5Crit_0.1
-1.3060.2500.4491.3051.9432.5773.4164.0725.564

Example 3: Two Groups, Right Side Empirical

Inputs:

samplesmidrank
1.11.2FALSE
2.22.1
3.33.4

Excel formula:

=ANDERSON_KSAMP({1.1,1.2;2.2,2.1;3.3,3.4}, FALSE)

Expected output:

Statisticp-valueCrit_25Crit_10Crit_5Crit_2.5Crit_1Crit_0.5Crit_0.1
-0.8670.2500.3251.2261.9612.7183.7524.5926.546

Example 4: Three Groups, Right Side Empirical

Inputs:

samplesmidrank
1.11.21.3FALSE
2.22.12.3
3.33.43.1

Excel formula:

=ANDERSON_KSAMP({1.1,1.2,1.3;2.2,2.1,2.3;3.3,3.4,3.1}, FALSE)

Expected output:

Statisticp-valueCrit_25Crit_10Crit_5Crit_2.5Crit_1Crit_0.5Crit_0.1
-1.2390.2500.4491.3051.9432.5773.4164.0725.564

Python Code

from scipy.stats import anderson_ksamp as scipy_anderson_ksamp from typing import List, Union def anderson_ksamp(samples: List[List[float]], midrank: bool = True) -> Union[List[List[float]], str]: """ Performs the k-sample Anderson-Darling test to determine if samples are drawn from the same population. Args: samples: 2D list of float values. Each column represents a sample group. midrank: If True, uses the midrank test (default, suitable for continuous and discrete data). If False, uses the right side empirical distribution for discrete data. Returns: 2D list with a single row: [statistic, pvalue, critical_25, critical_10, critical_5, critical_2_5, critical_1, critical_0_5, critical_0_1], or an error message (str) if input is invalid. This example function is provided as-is without any representation of accuracy. """ # Validate samples if not isinstance(samples, list) or len(samples) < 2: return "Invalid input: samples must be a 2D list with at least two columns (sample groups)." if any(not isinstance(col, list) or len(col) < 2 for col in samples): return "Invalid input: each sample group must be a list with at least two values." try: # Transpose columns to rows for scipy transposed = [list(col) for col in samples] # Check for non-numeric values for group in transposed: for v in group: if not isinstance(v, (int, float)): return "Invalid input: all sample values must be numeric." except Exception: return "Invalid input: samples must be a 2D list of floats." try: result = scipy_anderson_ksamp(transposed, midrank=midrank) except Exception as e: return f"scipy.stats.anderson_ksamp error: {e}" # Compose output row output = [ float(result.statistic), float(result.pvalue), float(result.critical_values[0]), float(result.critical_values[1]), float(result.critical_values[2]), float(result.critical_values[3]), float(result.critical_values[4]), float(result.critical_values[5]), float(result.critical_values[6]) ] # Check for nan/inf if any([ isinstance(x, float) and (x != x or x == float('inf') or x == float('-inf')) for x in output ]): return "Invalid output: statistic or critical values are not finite." return [output]

Example Workbook

Link to Workbook

Last updated on