Skip to Content

CHI2_CONTINGENCY

Overview

The CHI2_CONTINGENCY function performs the chi-square test of independence for variables in a contingency table. It tests whether the observed frequencies in the table differ significantly from the expected frequencies, assuming the variables are independent. This test is widely used in statistics to analyze categorical data and determine if there is a significant association between two or more variables. The calculation is based on Pearson’s chi-squared statistic:

χ2=i,j(OijEij)2Eij\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

where OijO_{ij} are the observed frequencies and EijE_{ij} are the expected frequencies under the null hypothesis of independence. For more details, see the scipy.stats.chi2_contingency documentation.

This example function is provided as-is without any representation of accuracy.

Excel provides the CHISQ.TEST function and the Data Analysis ToolPak for chi-square tests. However, Excel’s built-in tools are limited to 2D tables and do not provide expected frequencies or degrees of freedom directly. The CHI2_CONTINGENCY function returns all relevant statistics, including the test statistic, p-value, degrees of freedom, and expected frequencies, making it more comprehensive for statistical analysis.

Usage

To use the function in Excel:

=CHI2_CONTINGENCY(observed, [correction], [lambda_])
  • observed (2D or higher list, required): Contingency table of observed frequencies. Each cell must be a non-negative number. Must have at least two rows and two columns.
  • correction (bool, optional, default=True): If True and the table is 2x2, applies Yates’ correction for continuity.
  • lambda_ (float or str, optional, default=None): Statistic from the Cressie-Read power divergence family. Use None for Pearson’s chi-squared statistic.

The function returns a 2D list with the following rows:

  • Row 1: [statistic, p-value, degrees of freedom]
  • Row 2 and onward: Expected frequencies table (same shape as observed) If the input is invalid, returns an error message (string).

Examples

Example 1: 2x3 Table (Default Options)

Inputs:

observed
101020
202020

Excel formula:

=CHI2_CONTINGENCY({10,10,20;20,20,20})

Expected output:

statisticp-valuedof
2.777780.249352
expected_freq
121216
181824

Example 2: 2x2 Table (Yates’ Correction)

Inputs:

observed
123
1716

Excel formula:

=CHI2_CONTINGENCY({12,3;17,16})

Expected output:

statisticp-valuedof
2.409110.120631
expected_freq
9.06255.9375
19.937513.0625

Example 3: 2x2 Table (No Correction)

Inputs:

observed
123
1716

Excel formula:

=CHI2_CONTINGENCY({12,3;17,16}, FALSE)

Expected output:

statisticp-valuedof
3.498830.061411
expected_freq
9.06255.9375
19.937513.0625

Example 4: 3x2 Table

Inputs:

observed
1020
2015
1525

Excel formula:

=CHI2_CONTINGENCY({10,20;20,15;15,25})

Expected output:

statisticp-valuedof
4.496530.105582
expected_freq
12.857117.1429
15.020.0
17.142922.8571

Python Code

from scipy.stats import chi2_contingency as scipy_chi2_contingency def chi2_contingency(observed, correction=True, lambda_=None): """ Perform the chi-square test of independence for variables in a contingency table. Args: observed: 2D or higher list of observed frequencies (each cell must be a non-negative number, at least 2 rows and 2 columns). correction: If True and the table is 2x2, applies Yates’ correction for continuity (default: True). lambda_: Statistic from the Cressie-Read power divergence family (float or str, default: None). Returns: 2D list: First row is [statistic, p-value, degrees of freedom], following rows are the expected frequencies table (same shape as observed), or an error message (str) if input is invalid. This example function is provided as-is without any representation of accuracy. """ # Validate observed if not isinstance(observed, list) or len(observed) < 2: return "Invalid input: observed must be a 2D list with at least two rows." # Check for at least two columns if not all(isinstance(row, list) and len(row) >= 2 for row in observed): return "Invalid input: observed must be a 2D list with at least two columns per row." # Check all values are non-negative numbers try: obs_arr = [[float(cell) for cell in row] for row in observed] if any(cell < 0 for row in obs_arr for cell in row): return "Invalid input: all observed frequencies must be non-negative." except Exception: return "Invalid input: observed must contain only numbers." # Try to run the test try: res = scipy_chi2_contingency(obs_arr, correction=bool(correction), lambda_=lambda_) stat = float(res.statistic) pval = float(res.pvalue) dof = int(res.dof) expected = res.expected_freq.tolist() # Compose output: first row is [statistic, p-value, dof], then expected_freq rows output = [[stat, pval, dof]] output.extend(expected) return output except Exception as e: return f"scipy.stats.chi2_contingency error: {e}"

Example Workbook

Link to Workbook

Last updated on