Does your data violate goodness of fit (chi-square) test assumptions?

Home | StatGuide | Glossary

If the population from which data to be analyzed by a goodness of fit (chi-square) test were sampled violate one or more of the goodness of fit (chi-square) test assumptions, the results of the analysis may be incorrect or misleading. For example, if the assumption of independence is violated, then the goodness of fit (chi-square) test is simply not appropriate.

If the total sample size is small, then the expected values may be too small for the approximation involved in the chi-square test to be valid.

If it is not possible to cleanly assign each observation to exactly one cell (category) of the table, or if an ad hoc scheme is used to divide a continuous variable into discrete categories, then the results of the goodness of fit chi-square test may vary greatly depending on the exact apportionment of observations into cells of the table.

If the categories are ordered instead of nominal, especially if one or both of the classification variables is actually continuous rather than discrete, then a chi-square goodness of fit test may not be the most powerful test available, and this could mean the difference between detecting a true difference or not. Generally speaking, if you are testing against a well-known distribution like the normal distribution, there is likely to be a more powerful test tailored to that specific distribution, and which may not require you to completely specify the distribution function beforehand.

Often, the effect of an assumption violation on the test result depends on the extent of the violation.

Potential assumption violations include:

Glossary | StatGuide Home | Home