POINTBISERIALR

Overview

The POINTBISERIALR function calculates the point-biserial correlation coefficient, a measure of the strength and direction of association between a binary variable (coded as 0 and 1) and a continuous variable. This statistic is commonly used in psychometrics, educational testing, and social science research to assess relationships such as whether a treatment group (1) versus control group (0) differs on a continuous outcome measure.

The point-biserial correlation is mathematically equivalent to the Pearson correlation coefficient when applied to a dichotomous and continuous variable pair. Like other correlation coefficients, it ranges from -1 to +1, where 0 indicates no correlation, and values of -1 or +1 indicate a perfect determinative relationship between the variables.

This implementation uses SciPy’s pointbiserialr function from the scipy.stats module. The function returns both the correlation coefficient and a two-sided p-value based on a t-test with n-2 degrees of freedom.

The point-biserial correlation coefficient is calculated using the formula:

r_{pb} = \frac{\bar{Y}_1 - \bar{Y}_0}{s_y} \sqrt{\frac{N_0 N_1}{N(N-1)}}

where \bar{Y}_0 and \bar{Y}_1 are the means of the continuous variable for observations coded 0 and 1 respectively, N_0 and N_1 are the counts of observations in each group, N is the total sample size, and s_y is the standard deviation of the continuous variable.

A significant point-biserial correlation (p-value below a chosen threshold such as 0.05) is equivalent to finding a significant difference in means between the two groups via an independent samples t-test. The relationship between the t-statistic and r_{pb} is given by:

t = \sqrt{N-2} \cdot \frac{r_{pb}}{\sqrt{1 - r_{pb}^2}}

For additional background on the point-biserial correlation, see Tate (1954) and the Wiley StatsRef entry on Point Biserial Correlation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=POINTBISERIALR(x, y)

x (list[list], required): Binary variable (column vector of 0s and 1s)
y (list[list], required): Continuous variable (column vector), same length as x

Returns (list[list]): 2D list [[correlation, p_value]], or error message string.

Example 1: Demo case 1

Inputs:

x	y
0	1
0	2
0	3
1	4
1	5
1	6
1	7

Excel formula:

=POINTBISERIALR({0;0;0;1;1;1;1}, {1;2;3;4;5;6;7})

Expected output:

Result
0.866025	0.0117248

Example 2: Demo case 2

Inputs:

x	y
0	1
0	1
1	5
1	5

Excel formula:

=POINTBISERIALR({0;0;1;1}, {1;1;5;5})

Expected output:

Result
1	0

Example 3: Demo case 3

Inputs:

x	y
0	10
0	8
0	9
1	2
1	3
1	1

Excel formula:

=POINTBISERIALR({0;0;0;1;1;1}, {10;8;9;2;3;1})

Expected output:

Result
-0.973852	0.00101666

Example 4: Demo case 4

Inputs:

x	y
0	1
0	5
1	3
1	3

Excel formula:

=POINTBISERIALR({0;0;1;1}, {1;5;3;3})

Expected output:

Result
0	1

Python Code

Show Code

from scipy.stats import pointbiserialr as scipy_pointbiserialr

def pointbiserialr(x, y):
    """
    Calculate a point biserial correlation coefficient and its p-value.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pointbiserialr.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list]): Binary variable (column vector of 0s and 1s)
        y (list[list]): Continuous variable (column vector), same length as x

    Returns:
        list[list]: 2D list [[correlation, p_value]], or error message string.
    """
    # Helper function to convert scalar or 2D list inputs
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    try:
      # Normalize inputs
      x = to2d(x)
      y = to2d(y)

      # Flatten 2D lists to 1D and convert to numeric
      x_flat = []
      for row in x:
        if isinstance(row, list):
          x_flat.extend(row)
        else:
          x_flat.append(row)

      y_flat = []
      for row in y:
        if isinstance(row, list):
          y_flat.extend(row)
        else:
          y_flat.append(row)

      x_array = [float(val) for val in x_flat]
      y_array = [float(val) for val in y_flat]

      # Check that arrays have the same length
      if len(x_array) != len(y_array):
        return "Error: Invalid input: x and y must have the same length."

      # Check minimum length
      if len(x_array) < 3:
        return "Error: Invalid input: arrays must contain at least 3 elements."

      # Validate that x contains only binary values (0 or 1)
      x_unique = set(x_array)
      if not x_unique.issubset({0.0, 1.0}):
        return "Error: Invalid input: x must contain only binary values (0 or 1)."

      # Check that we have both 0 and 1 values in x
      if len(x_unique) < 2:
        return "Error: Invalid input: x must contain both 0 and 1 values."

      # Check for constant y values
      if len(set(y_array)) == 1:
        return "Error: Invalid input: y must contain varying values (not all identical)."

      # Calculate point-biserial correlation
      result = scipy_pointbiserialr(x_array, y_array)
      correlation = float(result.statistic)
      pvalue = float(result.pvalue)

      # Return as 2D list (single row, two columns)
      return [[correlation, pvalue]]
    except Exception as e:
      return f"Error: {str(e)}"

Online Calculator

x *

Binary variable (column vector of 0s and 1s)

y *

Continuous variable (column vector), same length as x