RANKSUMS
Overview
The RANKSUMS
function computes the Wilcoxon rank-sum statistic and p-value for two independent samples. This nonparametric test, also known as the Mann-Whitney U test, is used to determine whether one of two samples of independent observations tends to have larger values than the other. It is especially useful when the data do not meet the assumptions required for parametric tests like the t-test. The test statistic is based on the ranks of the combined data:
where is the rank of the -th observation in the first sample, and is the size of the first sample. The p-value is computed according to the specified alternative hypothesis. For more details, see the scipy.stats.ranksums documentation .
This wrapper exposes only the most commonly used parameters: the two sample arrays and the alternative hypothesis. Parameters related to axis selection, NaN handling, and broadcasting are omitted for simplicity in the Excel context. This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=RANKSUMS(x, y, [alternative])
x
(2D list, required): First sample data. Must be a 2D array (rectangular range) with at least two rows.y
(2D list, required): Second sample data. Must be a 2D array (rectangular range) with at least two rows.alternative
(string, optional, default="two-sided"
): Defines the alternative hypothesis. Must be one of"two-sided"
,"less"
, or"greater"
.
The function returns a single-row 2D array: [statistic, pvalue]
, both floats. If the input is invalid, an error message (string) is returned.
Examples
Example 1: Basic Case (Two Rows Each)
Inputs:
x | y | ||
---|---|---|---|
1.0 | 2.0 | 5.0 | 6.0 |
3.0 | 4.0 | 7.0 | 8.0 |
Excel formula:
=RANKSUMS({1,2;3,4}, {5,6;7,8})
Expected output:
statistic | pvalue |
---|---|
-2.309 | 0.021 |
Example 2: With Alternative ‘less’
Inputs:
x | y | alternative | ||
---|---|---|---|---|
1.0 | 2.0 | 5.0 | 6.0 | less |
3.0 | 4.0 | 7.0 | 8.0 |
Excel formula:
=RANKSUMS({1,2;3,4}, {5,6;7,8}, "less")
Expected output:
statistic | pvalue |
---|---|
-2.309 | 0.010 |
Example 3: With Alternative ‘greater’
Inputs:
x | y | alternative | ||
---|---|---|---|---|
1.0 | 2.0 | 5.0 | 6.0 | greater |
3.0 | 4.0 | 7.0 | 8.0 |
Excel formula:
=RANKSUMS({1,2;3,4}, {5,6;7,8}, "greater")
Expected output:
statistic | pvalue |
---|---|
-2.309 | 0.990 |
Example 4: Larger Samples
Inputs:
x | y | ||
---|---|---|---|
1.0 | 2.0 | 7.0 | 8.0 |
3.0 | 4.0 | 9.0 | 10.0 |
5.0 | 6.0 | 11.0 | 12.0 |
Excel formula:
=RANKSUMS({1,2;3,4;5,6}, {7,8;9,10;11,12})
Expected output:
statistic | pvalue |
---|---|
-2.882 | 0.004 |
Python Code
from scipy.stats import ranksums as scipy_ranksums
from typing import List, Union
def ranksums(x: List[List[float]], y: List[List[float]], alternative: str = 'two-sided') -> Union[List[List[float]], str]:
"""
Computes the Wilcoxon rank-sum statistic and p-value for two independent samples.
Args:
x: 2D list of float values. First sample data.
y: 2D list of float values. Second sample data.
alternative: Defines the alternative hypothesis ('two-sided', 'less', 'greater'). Default is 'two-sided'.
Returns:
2D list with one row: [statistic, pvalue], or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate x and y are 2D lists with at least two rows
if not (isinstance(x, list) and all(isinstance(row, list) for row in x)) or len(x) < 2:
return "Invalid input: x must be a 2D list with at least two rows."
if not (isinstance(y, list) and all(isinstance(row, list) for row in y)) or len(y) < 2:
return "Invalid input: y must be a 2D list with at least two rows."
# Flatten x and y
try:
x_flat = [float(item) for row in x for item in row]
y_flat = [float(item) for row in y for item in row]
except Exception:
return "Invalid input: x and y must contain only numeric values."
if len(x_flat) < 2 or len(y_flat) < 2:
return "Invalid input: x and y must each contain at least two values."
# Validate alternative
if alternative not in ('two-sided', 'less', 'greater'):
return "Invalid input: alternative must be 'two-sided', 'less', or 'greater'."
try:
result = scipy_ranksums(x_flat, y_flat, alternative=alternative)
stat, pvalue = result.statistic, result.pvalue
# Disallow nan/inf
if any([
stat is None, pvalue is None,
isinstance(stat, float) and (stat != stat or stat in (float('inf'), float('-inf'))),
isinstance(pvalue, float) and (pvalue != pvalue or pvalue in (float('inf'), float('-inf')))
]):
return "Invalid result: statistic or pvalue is nan or inf."
return [[stat, pvalue]]
except Exception as e:
return f"scipy.stats.ranksums error: {e}"