CHI2_CONTINGENCY
Overview
The CHI2_CONTINGENCY
function performs the chi-square test of independence for variables in a contingency table. It tests whether the observed frequencies in the table differ significantly from the expected frequencies, assuming the variables are independent. This test is widely used in statistics to analyze categorical data and determine if there is a significant association between two or more variables. The calculation is based on Pearson’s chi-squared statistic:
where are the observed frequencies and are the expected frequencies under the null hypothesis of independence. For more details, see the scipy.stats.chi2_contingency documentation .
This example function is provided as-is without any representation of accuracy.
Excel provides the CHISQ.TEST function and the Data Analysis ToolPak for chi-square tests. However, Excel’s built-in tools are limited to 2D tables and do not provide expected frequencies or degrees of freedom directly. The CHI2_CONTINGENCY
function returns all relevant statistics, including the test statistic, p-value, degrees of freedom, and expected frequencies, making it more comprehensive for statistical analysis.
Usage
To use the function in Excel:
=CHI2_CONTINGENCY(observed, [correction], [lambda_])
observed
(2D or higher list, required): Contingency table of observed frequencies. Each cell must be a non-negative number. Must have at least two rows and two columns.correction
(bool, optional, default=True): If True and the table is 2x2, applies Yates’ correction for continuity.lambda_
(float or str, optional, default=None): Statistic from the Cressie-Read power divergence family. Use None for Pearson’s chi-squared statistic.
The function returns a 2D list with the following rows:
- Row 1: [statistic, p-value, degrees of freedom]
- Row 2 and onward: Expected frequencies table (same shape as observed) If the input is invalid, returns an error message (string).
Examples
Example 1: 2x3 Table (Default Options)
Inputs:
observed | ||
---|---|---|
10 | 10 | 20 |
20 | 20 | 20 |
Excel formula:
=CHI2_CONTINGENCY({10,10,20;20,20,20})
Expected output:
statistic | p-value | dof |
---|---|---|
2.77778 | 0.24935 | 2 |
expected_freq | ||
---|---|---|
12 | 12 | 16 |
18 | 18 | 24 |
Example 2: 2x2 Table (Yates’ Correction)
Inputs:
observed | |
---|---|
12 | 3 |
17 | 16 |
Excel formula:
=CHI2_CONTINGENCY({12,3;17,16})
Expected output:
statistic | p-value | dof |
---|---|---|
2.40911 | 0.12063 | 1 |
expected_freq | |
---|---|
9.0625 | 5.9375 |
19.9375 | 13.0625 |
Example 3: 2x2 Table (No Correction)
Inputs:
observed | |
---|---|
12 | 3 |
17 | 16 |
Excel formula:
=CHI2_CONTINGENCY({12,3;17,16}, FALSE)
Expected output:
statistic | p-value | dof |
---|---|---|
3.49883 | 0.06141 | 1 |
expected_freq | |
---|---|
9.0625 | 5.9375 |
19.9375 | 13.0625 |
Example 4: 3x2 Table
Inputs:
observed | |
---|---|
10 | 20 |
20 | 15 |
15 | 25 |
Excel formula:
=CHI2_CONTINGENCY({10,20;20,15;15,25})
Expected output:
statistic | p-value | dof |
---|---|---|
4.49653 | 0.10558 | 2 |
expected_freq | |
---|---|
12.8571 | 17.1429 |
15.0 | 20.0 |
17.1429 | 22.8571 |
Python Code
from scipy.stats import chi2_contingency as scipy_chi2_contingency
def chi2_contingency(observed, correction=True, lambda_=None):
"""
Perform the chi-square test of independence for variables in a contingency table.
Args:
observed: 2D or higher list of observed frequencies (each cell must be a non-negative number, at least 2 rows and 2 columns).
correction: If True and the table is 2x2, applies Yates’ correction for continuity (default: True).
lambda_: Statistic from the Cressie-Read power divergence family (float or str, default: None).
Returns:
2D list: First row is [statistic, p-value, degrees of freedom], following rows are the expected frequencies table (same shape as observed), or an error message (str) if input is invalid.
This example function is provided as-is without any representation of accuracy.
"""
# Validate observed
if not isinstance(observed, list) or len(observed) < 2:
return "Invalid input: observed must be a 2D list with at least two rows."
# Check for at least two columns
if not all(isinstance(row, list) and len(row) >= 2 for row in observed):
return "Invalid input: observed must be a 2D list with at least two columns per row."
# Check all values are non-negative numbers
try:
obs_arr = [[float(cell) for cell in row] for row in observed]
if any(cell < 0 for row in obs_arr for cell in row):
return "Invalid input: all observed frequencies must be non-negative."
except Exception:
return "Invalid input: observed must contain only numbers."
# Try to run the test
try:
res = scipy_chi2_contingency(obs_arr, correction=bool(correction), lambda_=lambda_)
stat = float(res.statistic)
pval = float(res.pvalue)
dof = int(res.dof)
expected = res.expected_freq.tolist()
# Compose output: first row is [statistic, p-value, dof], then expected_freq rows
output = [[stat, pval, dof]]
output.extend(expected)
return output
except Exception as e:
return f"scipy.stats.chi2_contingency error: {e}"