Chi-Square Test of Independence

The Chi-Square Test tab tests whether two categorical variables are independent using Pearson's chi-square test of independence.

Getting started

Open the tab

Select Analysis > Chi-Square Test... from the menu bar.

Run the test

Configure the following in the settings panel:

  1. Select the dataset from Dataset
  2. Select a categorical variable for Row variable
  3. Select a categorical variable for Column variable
  4. Click Run

Both Row variable and Column variable must have at least 2 categories. Only columns with nominal or ordinal measurement scale are available.

Hypotheses

  • H₀ (null hypothesis): The row variable and column variable are independent.
  • H₁ (alternative hypothesis): The row variable and column variable are not independent.

Reading the results

The result panel displays the hypotheses, a conclusion at significance level α=0.05\alpha = 0.05, and the test statistics.

StatisticDescription
χ2\chi^2Pearson's chi-square statistic. Aggregates the deviation between observed and expected frequencies across all cells
dfDegrees of freedom (r1)(c1)(r-1)(c-1), where rr is the number of row categories and cc the number of column categories
pp-value. The probability of obtaining a chi-square statistic at least as extreme as the observed value, assuming the null hypothesis is true
Cramer's VEffect size. Computed as V=χ2/(N(min(r,c)1))V = \sqrt{\chi^2 / (N \cdot (\min(r, c) - 1))}, ranging from 0 to 1. 0 indicates complete independence and 1 indicates complete association. The interpretation of VV depends on the table dimension min(r,c)1\min(r, c) - 1, so care is needed when comparing VV across tables of different sizes

Contingency table

A contingency table is displayed below the result panel. Each cell shows both the observed frequency and the expected frequency. The expected frequency is the theoretical frequency under the null hypothesis of independence, calculated as (row total×column total)/grand total(row\ total \times column\ total) / grand\ total.

Rows with missing values are excluded from the analysis. The number of excluded rows is shown above the table.

Adjusted standardized residuals

Enable the Adjusted standardized residuals checkbox to display the adjusted standardized residual dijd_{ij} for each cell.

dij=OijEijEij(1ni/n)(1nj/n)d_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1 - n_{i \cdot}/n)(1 - n_{\cdot j}/n)}}

OijO_{ij} is the observed frequency, EijE_{ij} is the expected frequency, nin_{i \cdot} is the row total, njn_{\cdot j} is the column total, and nn is the grand total.

Cells with large absolute residuals deviate strongly from independence. Positive residuals indicate more observations than expected; negative residuals indicate fewer.

When residuals are enabled, cells are colored with a diverging color scale. Residuals exceeding the Bonferroni-corrected critical value are shown in bold. The Bonferroni correction divides the significance level α\alpha by the degrees of freedom (r1)(c1)(r-1)(c-1) to adjust the threshold for each cell.

Limitations of the chi-square approximation

Pearson's chi-square test relies on the test statistic asymptotically following a chi-square distribution. When the sample size is small or many cells have low expected frequencies, the accuracy of this approximation decreases. Check the expected frequencies (E)(E) displayed in the contingency table to assess whether the approximation is appropriate.

For 2x2 tables, Yates' continuity correction and Fisher's exact test are known alternatives, but MIDAS currently computes only the uncorrected Pearson chi-square statistic.

Other test methods

For comparing means between two groups, use Two-Sample Test / Paired Test. For comparing means across three or more groups, use ANOVA. For frequency tabulation of categorical variables, use Crosstab.

References

  • Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157-175.
  • Agresti, A. (2007). An Introduction to Categorical Data Analysis (2nd ed., pp. 38-40). Wiley.