Association and Correlation

Overview

Association tests determine if two variables are related or independent. This category covers two main types of relationships: 1. Contingency Tables: Associations between categorical variables (e.g., “Is smoking status related to lung cancer diagnosis?”). 2. Correlation Coefficients: Linear or monotonic relationships between continuous or ordinal variables (e.g., “Does study time correlate with exam scores?”).

Categorical Association (Contingency Tables)

When checking if two categorical variables are independent:

Rank Correlation

When data is ordinal (ranked) or non-normal, standard Pearson correlation is inappropriate. Rank correlations measure monotonic relationships (steadily increasing or decreasing) rather than just linear ones.

  • SPEARMANR: Spearman’s rho. Equivalent to Pearson correlation on the ranks of the data. Robust to outliers.
  • KENDALLTAU: Kendall’s tau. Measures the correspondence between the ranking of two variables (concordant vs discordant pairs). Often preferred over Spearman for small datasets.
Figure 1: Correlation vs Association: (A) Strong linear correlation (Pearson ~ 1). (B) Strong monotonic but non-linear relationship (Pearson < 1, Spearman = 1). (C) No correlation.

Native Excel Capabilities

  • Pearson Correlation: Excel’s CORREL (or PEARSON) calculates Pearson’s r efficiently.
  • Chi-Squared: CHISQ.TEST calculates the p-value for independence, but requires you to pre-calculate the expected range compared to observed—it doesn’t work directly on a raw contingency table.
  • Rank Correlation: Excel has no native function for Spearman or Kendall correlation. Users must manually create rank columns using RANK.AVG and then run CORREL, which is tedious and prone to errors with ties.
  • Exact Tests: No native support for Fisher’s Exact test, making it impossible to correctly analyze small sample sizes in Excel without external tools.

Tools

Tool Description
BARNARD_EXACT Perform Barnard’s exact test on a 2x2 contingency table.
BOSCHLOO_EXACT Perform Boschloo’s exact test on a 2x2 contingency table.
CHI2_CONTINGENCY Perform the chi-square test of independence for variables in a contingency table.
FISHER_EXACT Perform Fisher’s exact test on a 2x2 contingency table.
KENDALLTAU Calculate Kendall’s tau, a correlation measure for ordinal data.
PAGE_TREND_TEST Perform Page’s L trend test for monotonic trends across treatments.
PEARSONR Calculate the Pearson correlation coefficient and p-value for two datasets.
POINTBISERIALR Calculate a point biserial correlation coefficient and its p-value.
SIEGELSLOPES Compute the Siegel repeated medians estimator for robust linear regression using scipy.stats.siegelslopes.
SOMERSD Calculate Somers’ D, an asymmetric measure of ordinal association between two variables.
SPEARMANR Calculate a Spearman rank-order correlation coefficient with associated p-value.
THEILSLOPES Compute the Theil-Sen estimator for a set of points (robust linear regression).
WEIGHTEDTAU Compute a weighted version of Kendall’s tau correlation coefficient.