CHI2_CONTINGENCY
Overview
The CHI2_CONTINGENCY function performs the chi-square test of independence to determine whether there is a statistically significant association between two categorical variables in a contingency table. This test is widely used in survey analysis, A/B testing, and scientific research to assess whether observed frequencies differ meaningfully from what would be expected if the variables were independent.
The function implements Pearson’s chi-squared test, which compares observed frequencies against expected frequencies calculated from the marginal totals of the table. Expected frequencies are computed under the assumption that the row and column variables are independent. The test statistic is calculated as:
\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}
where O_{ij} represents observed frequencies and E_{ij} represents expected frequencies for each cell. The degrees of freedom for an R \times C table equals (R-1)(C-1).
For 2×2 tables, Yates’ correction for continuity can be applied (enabled by default). This correction adjusts each observed value by 0.5 toward its expected value, producing a more conservative test that better approximates the chi-square distribution when sample sizes are small.
The function also supports the Cressie-Read power divergence family of statistics through the lambda_ parameter, allowing alternatives such as the log-likelihood ratio (G-test) to be computed instead of Pearson’s chi-squared statistic.
This implementation uses the scipy.stats.chi2_contingency function from SciPy. The function returns the test statistic, p-value, degrees of freedom, and the expected frequency table. A commonly cited guideline recommends that expected frequencies in each cell should be at least 5 for the chi-square approximation to be valid.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=CHI2_CONTINGENCY(observed, correction, lambda_)
observed(list[list], required): Contingency table of observed frequencies. Each cell must be a non-negative number. Must have at least two rows and two columns.correction(bool, optional, default: true): If True and the table is 2x2, applies Yates’ correction for continuity.lambda_(str, optional, default: null): Statistic from the Cressie-Read power divergence family. Use None for Pearson’s chi-squared statistic.
Returns (list[list]): 2D list [[stat, p, dof], expected…], or error string.
Examples
Example 1: Demo case 1
Inputs:
| observed | correction | ||
|---|---|---|---|
| 10 | 10 | 20 | true |
| 20 | 20 | 20 |
Excel formula:
=CHI2_CONTINGENCY({10,10,20;20,20,20}, TRUE)
Expected output:
| Result | ||
|---|---|---|
| 2.7778 | 0.2494 | 2 |
| 12 | 12 | 16 |
| 18 | 18 | 24 |
Example 2: Demo case 2
Inputs:
| observed | correction | |
|---|---|---|
| 12 | 3 | true |
| 17 | 16 |
Excel formula:
=CHI2_CONTINGENCY({12,3;17,16}, TRUE)
Expected output:
| Result | ||
|---|---|---|
| 2.4091 | 0.1206 | 1 |
| 9.0625 | 5.9375 | |
| 19.9375 | 13.0625 |
Example 3: Demo case 3
Inputs:
| observed | correction | |
|---|---|---|
| 12 | 3 | false |
| 17 | 16 |
Excel formula:
=CHI2_CONTINGENCY({12,3;17,16}, FALSE)
Expected output:
| Result | ||
|---|---|---|
| 3.4988 | 0.0614 | 1 |
| 9.0625 | 5.9375 | |
| 19.9375 | 13.0625 |
Example 4: Demo case 4
Inputs:
| observed | correction | |
|---|---|---|
| 10 | 20 | true |
| 20 | 15 | |
| 15 | 25 |
Excel formula:
=CHI2_CONTINGENCY({10,20;20,15;15,25}, TRUE)
Expected output:
| Result | ||
|---|---|---|
| 4.4965 | 0.1056 | 2 |
| 12.8571 | 17.1429 | |
| 15 | 20 | |
| 17.1429 | 22.8571 |
Python Code
from scipy.stats import chi2_contingency as scipy_chi2_contingency
def chi2_contingency(observed, correction=True, lambda_=None):
"""
Perform the chi-square test of independence for variables in a contingency table.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html
This example function is provided as-is without any representation of accuracy.
Args:
observed (list[list]): Contingency table of observed frequencies. Each cell must be a non-negative number. Must have at least two rows and two columns.
correction (bool, optional): If True and the table is 2x2, applies Yates' correction for continuity. Default is True.
lambda_ (str, optional): Statistic from the Cressie-Read power divergence family. Use None for Pearson's chi-squared statistic. Default is None.
Returns:
list[list]: 2D list [[stat, p, dof], expected...], or error string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
observed = to2d(observed)
# Validate observed is a 2D list
if not isinstance(observed, list) or not all(isinstance(row, list) for row in observed):
return "Invalid input: observed must be a 2D list."
if len(observed) < 2:
return "Invalid input: observed must have at least two rows."
if not all(len(row) >= 2 for row in observed):
return "Invalid input: observed must have at least two columns per row."
# Check all values are non-negative numbers
try:
obs_arr = [[float(cell) for cell in row] for row in observed]
if any(cell < 0 for row in obs_arr for cell in row):
return "Invalid input: all observed frequencies must be non-negative."
except (TypeError, ValueError):
return "Invalid input: observed must contain only numbers."
try:
res = scipy_chi2_contingency(obs_arr, correction=bool(correction), lambda_=lambda_)
stat = float(res.statistic)
pval = float(res.pvalue)
dof = int(res.dof)
expected = res.expected_freq.tolist()
# Determine max width needed (stats row has 3, expected may have more or fewer)
stats_row = [stat, pval, dof]
expected_cols = len(expected[0]) if expected else 0
max_width = max(3, expected_cols)
# Pad stats row if expected has more columns
if expected_cols > 3:
stats_row = stats_row + [""] * (expected_cols - 3)
# Pad expected rows if stats has more columns (3 cols for 2-column tables)
if expected_cols < 3:
expected = [row + [""] * (3 - expected_cols) for row in expected]
output = [stats_row]
output.extend(expected)
return output
except Exception as e:
return f"scipy.stats.chi2_contingency error: {e}"