Association and Correlation
Overview
Association tests determine if two variables are related or independent. This category covers two main types of relationships: 1. Contingency Tables: Associations between categorical variables (e.g., “Is smoking status related to lung cancer diagnosis?”). 2. Correlation Coefficients: Linear or monotonic relationships between continuous or ordinal variables (e.g., “Does study time correlate with exam scores?”).
Categorical Association (Contingency Tables)
When checking if two categorical variables are independent:
- CHI2_CONTINGENCY: Pearson’s Chi-squared test. The standard test for independence in broad contingency tables (R \times C). Checks if observed frequencies differ significantly from expected frequencies.
- FISHER_EXACT: Fisher’s exact test. Used for 2 \times 2 tables, especially when sample sizes are small (where Chi-squared fails).
- BARNARD_EXACT and BOSCHLOO_EXACT: More powerful exact tests for 2 \times 2 tables than Fisher’s test in many scenarios.
Rank Correlation
When data is ordinal (ranked) or non-normal, standard Pearson correlation is inappropriate. Rank correlations measure monotonic relationships (steadily increasing or decreasing) rather than just linear ones.
- SPEARMANR: Spearman’s rho. Equivalent to Pearson correlation on the ranks of the data. Robust to outliers.
- KENDALLTAU: Kendall’s tau. Measures the correspondence between the ranking of two variables (concordant vs discordant pairs). Often preferred over Spearman for small datasets.
Native Excel Capabilities
- Pearson Correlation: Excel’s
CORREL(orPEARSON) calculates Pearson’s r efficiently. - Chi-Squared:
CHISQ.TESTcalculates the p-value for independence, but requires you to pre-calculate the expected range compared to observed—it doesn’t work directly on a raw contingency table. - Rank Correlation: Excel has no native function for Spearman or Kendall correlation. Users must manually create rank columns using
RANK.AVGand then runCORREL, which is tedious and prone to errors with ties. - Exact Tests: No native support for Fisher’s Exact test, making it impossible to correctly analyze small sample sizes in Excel without external tools.
Tools
| Tool | Description |
|---|---|
| BARNARD_EXACT | Perform Barnard’s exact test on a 2x2 contingency table. |
| BOSCHLOO_EXACT | Perform Boschloo’s exact test on a 2x2 contingency table. |
| CHI2_CONTINGENCY | Perform the chi-square test of independence for variables in a contingency table. |
| FISHER_EXACT | Perform Fisher’s exact test on a 2x2 contingency table. |
| KENDALLTAU | Calculate Kendall’s tau, a correlation measure for ordinal data. |
| PAGE_TREND_TEST | Perform Page’s L trend test for monotonic trends across treatments. |
| PEARSONR | Calculate the Pearson correlation coefficient and p-value for two datasets. |
| POINTBISERIALR | Calculate a point biserial correlation coefficient and its p-value. |
| SIEGELSLOPES | Compute the Siegel repeated medians estimator for robust linear regression using scipy.stats.siegelslopes. |
| SOMERSD | Calculate Somers’ D, an asymmetric measure of ordinal association between two variables. |
| SPEARMANR | Calculate a Spearman rank-order correlation coefficient with associated p-value. |
| THEILSLOPES | Compute the Theil-Sen estimator for a set of points (robust linear regression). |
| WEIGHTEDTAU | Compute a weighted version of Kendall’s tau correlation coefficient. |