KRUSKAL

Overview

The KRUSKAL function performs the Kruskal-Wallis H-test, a non-parametric statistical test for comparing two or more independent samples to determine whether they originate from the same distribution. Named after William Kruskal and W. Allen Wallis, this test serves as a non-parametric alternative to one-way ANOVA when the assumption of normally distributed residuals cannot be met.

The test operates on ranks rather than raw values. All observations from all groups are combined and ranked from 1 to N (with tied values receiving the average of the ranks they would have obtained). The test statistic H is computed as:

H = \frac{12}{N(N+1)} \sum_{i=1}^{g} n_i \bar{r}_{i}^2 - 3(N+1)

where N is the total number of observations, g is the number of groups, n_i is the number of observations in group i, and \bar{r}_i is the average rank of all observations in group i. When ties are present, SciPy applies a correction factor to the H statistic.

This implementation uses the scipy.stats.kruskal function from SciPy, which returns both the H statistic (corrected for ties) and a p-value calculated under the assumption that H follows a chi-squared distribution with g-1 degrees of freedom.

The null hypothesis states that the population medians of all groups are equal. Rejecting the null hypothesis indicates that at least one sample stochastically dominates another, but does not identify which specific groups differ. Post-hoc tests such as Dunn’s test or pairwise Mann-Whitney tests with Bonferroni correction are typically used for follow-up comparisons. A common guideline is that each sample group should have at least 5 observations for the chi-squared approximation to be reliable. For more details, see the Wikipedia article on the Kruskal-Wallis test.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=KRUSKAL(samples)
  • samples (list[list], required): 2D list where each inner list represents a sample group of numeric values.

Returns (list[list]): 2D list [[statistic, p_value]], or error message string.

Examples

Example 1: Basic two groups comparison

Inputs:

samples
1.2 2.1 2.3
1.1 1.4 1.2

Excel formula:

=KRUSKAL({1.2,2.1,2.3;1.1,1.4,1.2})

Expected output:

Result
1.7647 0.184

Example 2: Three groups comparison

Inputs:

samples
1.2 2.1 2.3
1.1 1.4 1.2
2 2.2 2.4

Excel formula:

=KRUSKAL({1.2,2.1,2.3;1.1,1.4,1.2;2,2.2,2.4})

Expected output:

Result
4.2353 0.1203

Example 3: Similar groups with high p-value

Inputs:

samples
10 20 30
15 25 35
12 22 32

Excel formula:

=KRUSKAL({10,20,30;15,25,35;12,22,32})

Expected output:

Result
0.8 0.6703

Example 4: Groups with tied values

Inputs:

samples
1 2 2
2 2 3
1 1 1

Excel formula:

=KRUSKAL({1,2,2;2,2,3;1,1,1})

Expected output:

Result
5.6267 0.06

Python Code

from scipy.stats import kruskal as scipy_kruskal

def kruskal(samples):
    """
    Computes the Kruskal-Wallis H-test for independent samples.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        samples (list[list]): 2D list where each inner list represents a sample group of numeric values.

    Returns:
        list[list]: 2D list [[statistic, p_value]], or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    samples = to2d(samples)

    if not isinstance(samples, list) or len(samples) < 2:
        return "Invalid input: samples must be a 2D list with at least two groups."

    try:
        groups = []
        for group in samples:
            if not isinstance(group, list) or len(group) < 1:
                return "Invalid input: each sample group must be a non-empty list."
            for v in group:
                if not isinstance(v, (int, float)):
                    return "Invalid input: all sample values must be numeric."
            groups.append([float(x) for x in group])
    except Exception:
        return "Invalid input: samples must be a 2D list of numeric values."

    try:
        result = scipy_kruskal(*groups)
    except Exception as e:
        return f"scipy.stats.kruskal error: {e}"

    return [[float(result.statistic), float(result.pvalue)]]

Online Calculator