MANNWHITNEYU
Overview
The MANNWHITNEYU
function performs the Mann-Whitney U rank test on two independent samples to determine whether their population distributions differ. This non-parametric test is commonly used as an alternative to the independent t-test when data do not meet the assumptions of normality. The test ranks all values from both samples together and calculates the U statistic, which measures the difference in ranks between the two groups. The p-value indicates the probability of observing the data under the null hypothesis that the distributions are equal.
The calculation is based on the following equations:
where and are the sample sizes, and and are the sums of ranks for each sample. The test returns the U statistic and the p-value for the specified alternative hypothesis.
For more details, see the scipy.stats.mannwhitneyu documentation .
This wrapper simplifies the function by only supporting the most commonly used parameters: two sample arrays and the alternative hypothesis. Advanced options such as axis selection, method, continuity correction, NaN handling, and dimension keeping are not supported. Only the default asymptotic method is used.
This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=MANNWHITNEYU(x, y, [alternative])
x
(2D list, required): First sample data. Must be a 2D array (rectangular range) with at least two rows.y
(2D list, required): Second sample data. Must be a 2D array (rectangular range) with at least two rows.alternative
(string, optional, default=‘two-sided’): Defines the alternative hypothesis. Must be one of'two-sided'
,'less'
, or'greater'
.
The function returns a 2D array with one row: [U statistic, p-value]
, both as floats. If the input is invalid, it returns an error message (string).
Examples
Example 1: Basic Two-Sided Test
Inputs:
x | y | alternative | ||
---|---|---|---|---|
1.0 | 4.0 | two-sided | ||
2.0 | 5.0 | |||
3.0 | 6.0 |
Excel formula:
=MANNWHITNEYU({1.0;2.0;3.0}, {4.0;5.0;6.0}, "two-sided")
Expected output:
U statistic | p-value |
---|---|
0.000 | 0.100 |
Example 2: Greater Alternative
Inputs:
x | y | alternative | ||
---|---|---|---|---|
1.0 | 4.0 | greater | ||
2.0 | 5.0 | |||
3.0 | 6.0 |
Excel formula:
=MANNWHITNEYU({1.0;2.0;3.0}, {4.0;5.0;6.0}, "greater")
Expected output:
U statistic | p-value |
---|---|
0.000 | 1.000 |
Example 3: Less Alternative
Inputs:
x | y | alternative | ||
---|---|---|---|---|
4.0 | 1.0 | less | ||
5.0 | 2.0 | |||
6.0 | 3.0 |
Excel formula:
=MANNWHITNEYU({4.0;5.0;6.0}, {1.0;2.0;3.0}, "less")
Expected output:
U statistic | p-value |
---|---|
9.000 | 1.000 |
Example 4: Larger Samples, Two-Sided
Inputs:
x | y | alternative | ||
---|---|---|---|---|
1.0 | 5.0 | two-sided | ||
2.0 | 6.0 | |||
3.0 | 7.0 | |||
4.0 | 8.0 |
Excel formula:
=MANNWHITNEYU({1.0;2.0;3.0;4.0}, {5.0;6.0;7.0;8.0}, "two-sided")
Expected output:
U statistic | p-value |
---|---|
0.000 | 0.029 |
Python Code
from scipy.stats import mannwhitneyu as scipy_mannwhitneyu
from typing import List, Union
def mannwhitneyu(x: List[List[float]], y: List[List[float]], alternative: str = 'two-sided') -> Union[List[List[float]], str]:
"""
Performs the Mann-Whitney U rank test on two independent samples.
Args:
x: 2D list of float values. First sample data.
y: 2D list of float values. Second sample data.
alternative: Defines the alternative hypothesis ('two-sided', 'less', 'greater'). Default is 'two-sided'.
Returns:
2D list with one row: [U statistic, p-value]. Returns an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate x and y are 2D lists with at least two rows
if not (isinstance(x, list) and all(isinstance(row, list) for row in x) and len(x) >= 2):
return "Invalid input: x must be a 2D list with at least two rows."
if not (isinstance(y, list) and all(isinstance(row, list) for row in y) and len(y) >= 2):
return "Invalid input: y must be a 2D list with at least two rows."
# Flatten x and y
try:
x_flat = [float(item) for row in x for item in row]
y_flat = [float(item) for row in y for item in row]
except Exception:
return "Invalid input: x and y must contain only numeric values."
if len(x_flat) < 2 or len(y_flat) < 2:
return "Invalid input: x and y must each contain at least two values."
# Validate alternative
if alternative not in ('two-sided', 'less', 'greater'):
return "Invalid input: alternative must be 'two-sided', 'less', or 'greater'."
# Run test
try:
res = scipy_mannwhitneyu(x_flat, y_flat, alternative=alternative)
u_stat = float(res.statistic)
p_val = float(res.pvalue)
# Disallow nan/inf
if any([
u_stat != u_stat or p_val != p_val, # NaN check
abs(u_stat) == float('inf') or abs(p_val) == float('inf')
]):
return "Invalid result: U statistic or p-value is not finite."
return [[u_stat, p_val]]
except Exception as e:
return f"scipy.stats.mannwhitneyu error: {e}"