Skip to Content

POINTBISERIALR

Overview

The POINTBISERIALR function calculates a point biserial correlation coefficient and its p-value. The point biserial correlation is used to measure the relationship between a binary variable and a continuous variable. This correlation coefficient varies between -1 and +1 with 0 implying no correlation, and is mathematically equivalent to the Pearson correlation coefficient when one variable is binary. The function uses a t-test with n-1 degrees of freedom to compute the statistical significance.

The point-biserial correlation coefficient is calculated using the formula:

rpb=Y1Y0syN0N1N(N1)r_{pb} = \frac{\overline{Y_1} - \overline{Y_0}}{s_y} \sqrt{\frac{N_0 N_1}{N(N-1)}}

where Y0\overline{Y_0} and Y1\overline{Y_1} are means of the continuous observations for the binary groups coded 0 and 1 respectively; N0N_0 and N1N_1 are number of observations coded 0 and 1 respectively; NN is the total number of observations and sys_y is the standard deviation of all the continuous observations. For more details, see the scipy.stats documentation.

This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:

=POINTBISERIALR(x, y)
  • x (2D list, required): Array of binary values (0/1 or boolean). Must be the same length as y.
  • y (2D list, required): Array of continuous values. Must be the same length as x.

The function returns a 2D list with two elements: the correlation coefficient (float) and the two-sided p-value (float), or an error message (string) if the input is invalid.

Examples

Example 1: Basic Point-Biserial Correlation

This example calculates the point-biserial correlation between a binary variable (0/1) and a continuous variable.

Inputs:

xy
01
02
03
14
15
16
17

Excel formula:

=POINTBISERIALR({0;0;0;1;1;1;1}, {1;2;3;4;5;6;7})

Expected output:

CorrelationP-value
0.86602540.0117248

This shows a strong positive correlation between the binary and continuous variables.

Example 2: Perfect Positive Correlation

This example demonstrates a case with perfect positive correlation.

Inputs:

xy
01
01
15
15

Excel formula:

=POINTBISERIALR({0;0;1;1}, {1;1;5;5})

Expected output:

CorrelationP-value
1.00.0

This shows perfect positive correlation with highly significant p-value.

Example 3: Negative Correlation

This example shows a negative correlation between binary and continuous variables.

Inputs:

xy
010
08
09
12
13
11

Excel formula:

=POINTBISERIALR({0;0;0;1;1;1}, {10;8;9;2;3;1})

Expected output:

CorrelationP-value
-0.97385170.001016663

This shows a strong negative correlation between the variables.

Example 4: No Correlation

This example demonstrates a case where there is no correlation between the variables.

Inputs:

xy
01
05
13
13

Excel formula:

=POINTBISERIALR({0;0;1;1}, {1;5;3;3})

Expected output:

CorrelationP-value
0.01.0

This shows no correlation between the binary and continuous variables.

Python Code

from scipy.stats import pointbiserialr as scipy_pointbiserialr def pointbiserialr(x, y): """ Calculate a point biserial correlation coefficient and its p-value. Args: x: 2D list of binary values (0/1 or boolean). Must be the same length as y. y: 2D list of continuous values. Must be the same length as x. Returns: 2D list containing [correlation coefficient, p-value] (list of floats), or an error message (str) if input is invalid. This example function is provided as-is without any representation of accuracy. """ # Handle case where Excel passes single values as scalars if not isinstance(x, list): x = [[x]] if not isinstance(y, list): y = [[y]] # Convert 2D lists to 1D arrays try: x_flat = [] for row in x: if isinstance(row, list): x_flat.extend(row) else: x_flat.append(row) y_flat = [] for row in y: if isinstance(row, list): y_flat.extend(row) else: y_flat.append(row) # Convert to numeric arrays x_array = [float(val) for val in x_flat] y_array = [float(val) for val in y_flat] except (ValueError, TypeError): return "Invalid input: x and y must contain numeric values." # Check that arrays have the same length if len(x_array) != len(y_array): return "Invalid input: x and y must have the same length." # Check minimum length if len(x_array) < 3: return "Invalid input: arrays must contain at least 3 elements." # Validate that x contains only binary values (0 or 1) x_unique = set(x_array) if not x_unique.issubset({0.0, 1.0}): return "Invalid input: x must contain only binary values (0 or 1)." # Check that we have both 0 and 1 values in x if len(x_unique) < 2: return "Invalid input: x must contain both 0 and 1 values." # Check for constant y values if len(set(y_array)) == 1: return "Invalid input: y must contain varying values (not all identical)." try: # Calculate point-biserial correlation result = scipy_pointbiserialr(x_array, y_array) correlation = float(result.statistic) pvalue = float(result.pvalue) # Return as 2D list (single row, two columns) return [[correlation, pvalue]] except Exception as e: return f"scipy.stats.pointbiserialr error: {e}"

Example Workbook

Link to Workbook

Last updated on