Chi-Square Test of Independence
The Chi-Square Test tab tests whether two categorical variables are independent using Pearson's chi-square test of independence.
Getting started
Open the tab
Select Analysis > Chi-Square Test... from the menu bar.
Run the test
Configure the following in the settings panel:
- Select the dataset from Dataset
- Select a categorical variable for Row variable
- Select a categorical variable for Column variable
- Click Run
Both Row variable and Column variable must have at least 2 categories. Only columns with nominal or ordinal measurement scale are available.
Hypotheses
- H₀ (null hypothesis): The row variable and column variable are independent.
- H₁ (alternative hypothesis): The row variable and column variable are not independent.
Reading the results
The result panel displays the hypotheses, a conclusion at significance level , and the test statistics.
| Statistic | Description |
|---|---|
| Pearson's chi-square statistic. Aggregates the deviation between observed and expected frequencies across all cells | |
| df | Degrees of freedom , where is the number of row categories and the number of column categories |
| p | p-value. The probability of obtaining a chi-square statistic at least as extreme as the observed value, assuming the null hypothesis is true |
| Cramer's V | Effect size. Computed as , ranging from 0 to 1. 0 indicates complete independence and 1 indicates complete association. The interpretation of depends on the table dimension , so care is needed when comparing across tables of different sizes |
Contingency table
A contingency table is displayed below the result panel. Each cell shows both the observed frequency and the expected frequency. The expected frequency is the theoretical frequency under the null hypothesis of independence, calculated as .
Rows with missing values are excluded from the analysis. The number of excluded rows is shown above the table.
Adjusted standardized residuals
Enable the Adjusted standardized residuals checkbox to display the adjusted standardized residual for each cell.
is the observed frequency, is the expected frequency, is the row total, is the column total, and is the grand total.
Cells with large absolute residuals deviate strongly from independence. Positive residuals indicate more observations than expected; negative residuals indicate fewer.
When residuals are enabled, cells are colored with a diverging color scale. Residuals exceeding the Bonferroni-corrected critical value are shown in bold. The Bonferroni correction divides the significance level by the degrees of freedom to adjust the threshold for each cell.
Limitations of the chi-square approximation
Pearson's chi-square test relies on the test statistic asymptotically following a chi-square distribution. When the sample size is small or many cells have low expected frequencies, the accuracy of this approximation decreases. Check the expected frequencies displayed in the contingency table to assess whether the approximation is appropriate.
For 2x2 tables, Yates' continuity correction and Fisher's exact test are known alternatives, but MIDAS currently computes only the uncorrected Pearson chi-square statistic.
Other test methods
For comparing means between two groups, use Two-Sample Test / Paired Test. For comparing means across three or more groups, use ANOVA. For frequency tabulation of categorical variables, use Crosstab.
References
- Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157-175.
- Agresti, A. (2007). An Introduction to Categorical Data Analysis (2nd ed., pp. 38-40). Wiley.