PEARSONR
Overview
The PEARSONR
function calculates the Pearson correlation coefficient and the associated p-value for testing non-correlation between two datasets. The Pearson correlation coefficient measures the linear relationship between two datasets, ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear correlation. The p-value tests the null hypothesis that the distributions underlying the samples are uncorrelated and normally distributed. For more details, see the scipy.stats.pearsonr documentation .
Compared to Excel, the CORREL and PEARSON functions both calculate the Pearson correlation coefficient, but do not provide the associated p-value. To obtain a p-value in Excel, users must perform additional manual calculations or use the Data Analysis ToolPak. The PEARSONR
function in this package provides both the correlation coefficient and the p-value in a single call.
This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=PEARSONR(x, y)
x
(2D list, required): Table or array of values (numeric). Must have at least two rows.y
(2D list, required): Table or array of values (numeric). Must have the same number of rows asx
.
The function returns a 2D list with one row and two columns: [correlation, p_value]
, or an error message (string) if the input is invalid.
Examples
Example 1: Perfect Positive Correlation
Inputs:
x | y |
---|---|
1 | 2 |
2 | 4 |
3 | 6 |
4 | 8 |
Excel formula:
=PEARSONR({1;2;3;4}, {2;4;6;8})
Expected output:
Correlation | p_value |
---|---|
1.0 | 0.0 |
Example 2: Perfect Negative Correlation
Inputs:
x | y |
---|---|
1 | 8 |
2 | 6 |
3 | 4 |
4 | 2 |
Excel formula:
=PEARSONR({1;2;3;4}, {8;6;4;2})
Expected output:
Correlation | p_value |
---|---|
-1.0 | 0.0 |
Example 3: No Correlation
Inputs:
x | y |
---|---|
1 | 5 |
2 | 4 |
3 | 3 |
4 | 2 |
5 | 1 |
Excel formula:
=PEARSONR({1;2;3;4;5}, {5;4;3;2;1})
Expected output:
Correlation | p_value |
---|---|
-1.0 | 0.0 |
Example 4: Real Data Example
Inputs:
x | y |
---|---|
1 | 10 |
2 | 9 |
3 | 2.5 |
4 | 6 |
5 | 4 |
6 | 3 |
7 | 2 |
Excel formula:
=PEARSONR({1;2;3;4;5;6;7}, {10;9;2.5;6;4;3;2})
Expected output:
Correlation | p_value |
---|---|
-0.8285038835884277 | 0.021280260007523342 |
Python Code
from scipy.stats import pearsonr as scipy_pearsonr
def pearsonr(x, y):
"""
Calculate the Pearson correlation coefficient and p-value for two datasets.
Args:
x: 2D list of values (numeric), must have at least two rows.
y: 2D list of values (numeric), must have the same number of rows as x.
Returns:
2D list [[correlation, p_value]] or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Ensure x and y are lists and have at least two rows
if not isinstance(x, list) or not isinstance(y, list):
return "Invalid input: x and y must be 2D lists."
# If input is a scalar, wrap in 2D list
if not isinstance(x[0], list):
x = [[v] for v in x] if isinstance(x, list) else [[x]]
if not isinstance(y[0], list):
y = [[v] for v in y] if isinstance(y, list) else [[y]]
if len(x) < 2 or len(y) < 2:
return "Invalid input: x and y must have at least two rows."
if len(x) != len(y):
return "Invalid input: x and y must have the same number of rows."
try:
x_flat = [float(row[0]) for row in x]
y_flat = [float(row[0]) for row in y]
except Exception:
return "Invalid input: x and y must contain numeric values."
try:
result = scipy_pearsonr(x_flat, y_flat)
corr = float(result.statistic)
pval = float(result.pvalue)
# Disallow nan/inf
if any([c is None or isinstance(c, str) or c != c or abs(c) == float('inf') for c in [corr, pval]]):
return "Invalid result: correlation or p-value is not finite."
return [[corr, pval]]
except Exception as e:
return f"scipy.stats.pearsonr error: {e}"