Examining contingency table analysis results to detect assumption violations

Home | StatGuide | Glossary

All the following results are provided as part of a contingency table analysis.

Results for contingency table analysis:

  • Observed values:
  • The power of the test depends on the total sample size in the contingency table, and the expected cell frequencies depend on the overall total and the row and column marginal totals. Observed values of 0 may be either sampling zeroes or structural zeroes. If they are structural zeroes, the chi-square test and Fisher's exact test are not appropriate, although an alternative test may be available. If a set of marginal totals is fixed, then the more nearly those marginal totals equal each other, the more powerful the chi-square test will be for the same total sample size.
  • Expected values:
  • The chi-square test involves using the chi-square distribution to approximate the underlying exact distribution. The approximation becomes better as the expected cell frequencies grow larger, and may be inappropriate for contingency tables with very small expected cell frequencies. For tables with expected cell frequencies less than 5, the chi-square approximation may not be reliable. A standard (and conservative) rule of thumb is to avoid using the chi-square test for contingency tables with expected cell frequencies less than 1, or when more than 20% of the contingency table cells have expected cell frequencies less than 5. The table of expected values will reveal whether either of these conditions is true, and Prophet will also generate an appropriate warning in the test results.
  • Standardized residuals:
  • The table of standardized residuals gives the value for each cell of the difference between the observed and expected values for the cell, divided by the square root of the expected value for the cell, assuming the model of independence of row and column variables is true. If there are standardized residuals greater than 2 or less than -2, those cells are not being fitted very well by the model of independence. A large residual may mean that a particular cell is an outlier, or that there is an interaction between that particular row and column. When the categories have a natural order, then a pattern to residuals (e.g., large negative ones in one corner of the table, with large positive ones in another corner of the table) indicate the possibility of interactions.

Glossary | StatGuide Home | Home