TUKEY_HSD
Overview
The TUKEY_HSD
function performs Tukey’s Honest Significant Difference (HSD) test for pairwise comparisons of means across multiple groups, following a one-way ANOVA. This test is commonly used in statistics to determine which group means are significantly different from each other after finding a significant result in ANOVA. The calculation is based on the studentized range distribution and assumes equal variances among groups. The function wraps scipy.stats.tukey_hsd , but simplifies input to a single 2D list where each column is a group, and only supports the equal variance case (equal_var=True
). The Games-Howell test (equal_var=False
) is not supported in this wrapper.
The test statistic is:
where and are group means, is the mean squared error from ANOVA, and is the number of observations per group.
This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=TUKEY_HSD(samples, [equal_var])
samples
(2D list, required): Table of values, where each column is a group/sample and each row is an observation. Must have at least two columns and two rows.equal_var
(bool, optional, default=TRUE
): IfTRUE
, assumes equal variances (Tukey-HSD/Tukey-Kramer). IfFALSE
, returns an error (Games-Howell not supported).
The function returns a 2D array of p-values (float or None
) for each pairwise group comparison. If the input is invalid, it returns an error message (string). Each cell in the output array represents the p-value for the comparison between two groups; diagonal cells are always 1.0.
Examples
Example 1: Basic Two Groups
Inputs:
samples | equal_var | |
---|---|---|
1.2 | 2.3 | TRUE |
1.5 | 2.1 | |
1.3 | 2.2 | |
1.4 | 2.4 |
Excel formula:
=TUKEY_HSD({1.2,2.3;1.5,2.1;1.3,2.2;1.4,2.4})
Expected output:
Group 1 | Group 2 | |
---|---|---|
Group 1 | 1.000 | 0.000 |
Group 2 | 0.000 | 1.000 |
Example 2: Three Groups, Equal Variance
Inputs:
samples | equal_var | ||
---|---|---|---|
1.2 | 2.3 | 3.1 | TRUE |
1.5 | 2.1 | 3.2 | |
1.3 | 2.2 | 3.3 | |
1.4 | 2.4 | 3.4 |
Excel formula:
=TUKEY_HSD({1.2,2.3,3.1;1.5,2.1,3.2;1.3,2.2,3.3;1.4,2.4,3.4})
Expected output:
Group 1 | Group 2 | Group 3 | |
---|---|---|---|
Group 1 | 1.000 | 0.000 | 0.000 |
Group 2 | 0.000 | 1.000 | 0.000 |
Group 3 | 0.000 | 0.000 | 1.000 |
Example 3: Three Groups, Unequal Variance (Not Supported)
Inputs:
samples | equal_var | ||
---|---|---|---|
1.2 | 2.3 | 3.1 | FALSE |
1.5 | 2.1 | 3.2 | |
1.3 | 2.2 | 3.3 | |
1.4 | 2.4 | 3.4 |
Excel formula:
=TUKEY_HSD({1.2,2.3,3.1;1.5,2.1,3.2;1.3,2.2,3.3;1.4,2.4,3.4}, FALSE)
Expected output:
Result |
---|
Invalid input: Games-Howell test (equal_var=FALSE) is not supported by scipy.stats.tukey_hsd. |
Example 4: All Arguments Specified
Inputs:
samples | equal_var | ||
---|---|---|---|
5.1 | 6.2 | 7.3 | TRUE |
5.2 | 6.1 | 7.2 | |
5.3 | 6.3 | 7.1 | |
5.4 | 6.4 | 7.4 |
Excel formula:
=TUKEY_HSD({5.1,6.2,7.3;5.2,6.1,7.2;5.3,6.3,7.1;5.4,6.4,7.4}, TRUE)
Expected output:
Group 1 | Group 2 | Group 3 | |
---|---|---|---|
Group 1 | 1.000 | 0.000 | 0.000 |
Group 2 | 0.000 | 1.000 | 0.000 |
Group 3 | 0.000 | 0.000 | 1.000 |
Python Code
from scipy.stats import tukey_hsd as scipy_tukey_hsd
from typing import List, Optional, Union
def tukey_hsd(samples: List[List[float]], equal_var: bool = True) -> Union[List[List[Optional[float]]], str]:
"""
Performs Tukey's HSD test for equality of means over multiple treatments.
Args:
samples: 2D list of float values. Each column is a group/sample. Must be a 2D list with at least two columns and two rows.
equal_var: If True, assumes equal variances (Tukey-HSD/Tukey-Kramer). If False, uses Games-Howell test.
Returns:
2D list of p-values for each pairwise comparison, or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate samples
if not isinstance(samples, list) or len(samples) < 2 or not all(isinstance(row, list) for row in samples):
return "Invalid input: samples must be a 2D list with at least two rows."
n_rows = len(samples)
n_cols = len(samples[0]) if n_rows > 0 else 0
if n_cols < 2 or n_rows < 2:
return "Invalid input: samples must be a 2D list with at least two columns and two rows."
# Check all columns have same length
for row in samples:
if len(row) != n_cols:
return "Invalid input: all rows in samples must have the same number of columns."
# Transpose to columns (groups)
try:
groups = [[float(samples[row][col]) for row in range(n_rows)] for col in range(n_cols)]
except Exception:
return "Invalid input: samples must contain only numeric values."
# Only Tukey-HSD is supported
if not equal_var:
return "Invalid input: Games-Howell test (equal_var=False) is not supported by scipy.stats.tukey_hsd."
# Run Tukey HSD
try:
result = scipy_tukey_hsd(*groups)
except Exception as e:
return f"scipy.stats.tukey_hsd error: {e}"
# Extract p-values matrix
try:
pvals = result.pvalue
# pvals is a numpy array, convert to 2D list
pvals_list = pvals.tolist()
# Replace nan/inf with None
for i in range(len(pvals_list)):
for j in range(len(pvals_list[i])):
v = pvals_list[i][j]
if v is None:
continue
if isinstance(v, float):
if v != v or v == float('inf') or v == float('-inf'):
pvals_list[i][j] = None
return pvals_list
except Exception as e:
return f"Error extracting p-values: {e}"