POINTBISERIALR
Overview
The POINTBISERIALR
function calculates a point biserial correlation coefficient and its p-value. The point biserial correlation is used to measure the relationship between a binary variable and a continuous variable. This correlation coefficient varies between -1 and +1 with 0 implying no correlation, and is mathematically equivalent to the Pearson correlation coefficient when one variable is binary. The function uses a t-test with n-1 degrees of freedom to compute the statistical significance.
The point-biserial correlation coefficient is calculated using the formula:
where and are means of the continuous observations for the binary groups coded 0 and 1 respectively; and are number of observations coded 0 and 1 respectively; is the total number of observations and is the standard deviation of all the continuous observations. For more details, see the scipy.stats documentation .
This example function is provided as-is without any representation of accuracy.
Usage
To use the function in Excel:
=POINTBISERIALR(x, y)
x
(2D list, required): Array of binary values (0/1 or boolean). Must be the same length as y.y
(2D list, required): Array of continuous values. Must be the same length as x.
The function returns a 2D list with two elements: the correlation coefficient (float) and the two-sided p-value (float), or an error message (string) if the input is invalid.
Examples
Example 1: Basic Point-Biserial Correlation
This example calculates the point-biserial correlation between a binary variable (0/1) and a continuous variable.
Inputs:
x | y |
---|---|
0 | 1 |
0 | 2 |
0 | 3 |
1 | 4 |
1 | 5 |
1 | 6 |
1 | 7 |
Excel formula:
=POINTBISERIALR({0;0;0;1;1;1;1}, {1;2;3;4;5;6;7})
Expected output:
Correlation | P-value |
---|---|
0.8660254 | 0.0117248 |
This shows a strong positive correlation between the binary and continuous variables.
Example 2: Perfect Positive Correlation
This example demonstrates a case with perfect positive correlation.
Inputs:
x | y |
---|---|
0 | 1 |
0 | 1 |
1 | 5 |
1 | 5 |
Excel formula:
=POINTBISERIALR({0;0;1;1}, {1;1;5;5})
Expected output:
Correlation | P-value |
---|---|
1.0 | 0.0 |
This shows perfect positive correlation with highly significant p-value.
Example 3: Negative Correlation
This example shows a negative correlation between binary and continuous variables.
Inputs:
x | y |
---|---|
0 | 10 |
0 | 8 |
0 | 9 |
1 | 2 |
1 | 3 |
1 | 1 |
Excel formula:
=POINTBISERIALR({0;0;0;1;1;1}, {10;8;9;2;3;1})
Expected output:
Correlation | P-value |
---|---|
-0.9738517 | 0.001016663 |
This shows a strong negative correlation between the variables.
Example 4: No Correlation
This example demonstrates a case where there is no correlation between the variables.
Inputs:
x | y |
---|---|
0 | 1 |
0 | 5 |
1 | 3 |
1 | 3 |
Excel formula:
=POINTBISERIALR({0;0;1;1}, {1;5;3;3})
Expected output:
Correlation | P-value |
---|---|
0.0 | 1.0 |
This shows no correlation between the binary and continuous variables.
Python Code
from scipy.stats import pointbiserialr as scipy_pointbiserialr
def pointbiserialr(x, y):
"""
Calculate a point biserial correlation coefficient and its p-value.
Args:
x: 2D list of binary values (0/1 or boolean). Must be the same length as y.
y: 2D list of continuous values. Must be the same length as x.
Returns:
2D list containing [correlation coefficient, p-value] (list of floats),
or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Handle case where Excel passes single values as scalars
if not isinstance(x, list):
x = [[x]]
if not isinstance(y, list):
y = [[y]]
# Convert 2D lists to 1D arrays
try:
x_flat = []
for row in x:
if isinstance(row, list):
x_flat.extend(row)
else:
x_flat.append(row)
y_flat = []
for row in y:
if isinstance(row, list):
y_flat.extend(row)
else:
y_flat.append(row)
# Convert to numeric arrays
x_array = [float(val) for val in x_flat]
y_array = [float(val) for val in y_flat]
except (ValueError, TypeError):
return "Invalid input: x and y must contain numeric values."
# Check that arrays have the same length
if len(x_array) != len(y_array):
return "Invalid input: x and y must have the same length."
# Check minimum length
if len(x_array) < 3:
return "Invalid input: arrays must contain at least 3 elements."
# Validate that x contains only binary values (0 or 1)
x_unique = set(x_array)
if not x_unique.issubset({0.0, 1.0}):
return "Invalid input: x must contain only binary values (0 or 1)."
# Check that we have both 0 and 1 values in x
if len(x_unique) < 2:
return "Invalid input: x must contain both 0 and 1 values."
# Check for constant y values
if len(set(y_array)) == 1:
return "Invalid input: y must contain varying values (not all identical)."
try:
# Calculate point-biserial correlation
result = scipy_pointbiserialr(x_array, y_array)
correlation = float(result.statistic)
pvalue = float(result.pvalue)
# Return as 2D list (single row, two columns)
return [[correlation, pvalue]]
except Exception as e:
return f"scipy.stats.pointbiserialr error: {e}"