Does your data violate contingency table analysis assumptions?
If the populations from which data to be analyzed by a contingency
table analysis were sampled violate one or more of
the assumptions, the results of the analysis may be
incorrect or misleading.
For example, if the assumption of
independence
is violated, then neither the chi-square test nor Fisher's exact test
is appropriate, although another test (such as
Cochran's Q test or
McNemar's Q test)
may be appropriate.
If the total sample size
or particular row or columns totals are too small,
then the expected values may be too small for
the approximation involved in the chi-square
test to be valid.
If it is not possible to
cleanly assign each observation to exactly one
cell
of the contingency table, or if an ad hoc
scheme is used to divide a continuous variable into
discrete categories, then the results of the
chi-square or Fisher's exact test may vary
greatly depending on the exact apportionment
of observations into cells of the contingency table.
If one or both of the categories are ordered
instead of nominal, especially
if one or both of the classification variables
is actually continuous rather than discrete,
then a chi-square or Fisher's exact test may not be the most
powerful
test available, and this could mean the difference
between detecting a true difference or not.
Often, the effect of an assumption violation on the
test result depends on the extent of the violation.
Potential assumption violations include:
Interaction:Interaction(s) between row and column classifications
Lack of independence
within a sample may be caused by
interactions between categories of the row and column variables.
For example, the probability of death in patient category 1
from disease A may be higher than the probability
of death in patient category 2 from disease 1
because patients in category 1 tend to suffer a
worse form of the disease.
Such interactions may indicate
the existence of an implicit factor in the data.
For example, the probability of death in patient category 1
from disease A may be higher than the probability
of death in patient category 2 from disease 1
because patients in category 1 tend to be older than those
in category 2. In this case, age would be an
implicit factor.
Outliers in the table may be
due to interactions.
Whether the observations are
independent
of each other is generally
determined by the structure of the experiment from which
they arise. Obviously correlated samples, such as a
set of pre- and post-test observations on the same subjects,
are not independent, and such data would be more appropriately
tested by test like Cochran's Q test or
McNemar's Q test).
If you are unsure whether your samples are independent, you may wish to consult
a statistician or someone who is knowledgeable
about the data collection scheme you are using.
The model of independence may fit poorly due to the
presence of outliers.
Outliers are anomalous values in the
data.
They may be due to recording errors, which may be
correctable, or they may be due to the sample not being
entirely from the same population, or they may be
due to interactions
between row and column classificatins.
If you find outliers in your data that
are not due to correctable errors, you may wish to consult
a statistician as to how to proceed.
As long as the probability of falling into row category i
and the probability of falling into column category j
are both non-zero, the expected probability of falling
into cell(i,j) is also non-zero under the
usual two-way contingency table model of independence.
If the total sample size small, or if there are
many cells in the table, then
it may happen that no observations are recorded
for a particular cell. These zero values in
a table are sampling zeroes.
However, the actual process
that creates the observations may produce cells
in the contingency table in which observations
can never occur. The zero values that must
occur in these cells are structural zeroes.
A contingency table of cancer incidence by sex and
type of cancer must have the value 0 in the cell
for males and ovarian cancer, but the expected
number of males with ovarian cancer will not
be 0 as long as there is are at least 1 male
and 1 ovarian cancer patient among the observations.
A contingency table containing one or more
structural zeroes is an incomplete table.
The chi-square test and Fisher's exact test
are not designed for contingency tables with structural zeroes.
If you find structural zeroes in your data,
you may wish to consult
a statistician as to how to proceed,
perhaps using an alternative test.
The chi-square test involves using the chi-square
distribution
to approximate the underlying exact distribution. Although the
chi-square approximation can be used in all three sampling
schemes, the approximation becomes less accurate when marginal
totals are fixed. The best approximation will be most likely
be in the (multinomial) sampling scheme.
The approximation becomes better as the
expected cell
frequencies grow larger, and may be inappropriate
for contingency tables with very small expected cell frequencies.
In case of a 2x2 contingency table, an adjusted value of the
chi-square statistic (the Yates corrected chi-square)
is often used to correct for a continuous distribution (chi-square)
being used to approximate the very discrete distribution
of the values in the 2x2 table.
For tables with expected cell frequencies less than 5,
the chi-square approximation may not
be reliable. A standard (and conservative)
rule of thumb (due to Cochran) is to avoid using
the chi-square test for contingency tables with expected
cell frequencies less than 1, or when more than 20% of
the contingency table cells have expected cell frequencies
less than 5.
When no observations appear in a particular row category
(row total is 0) or a particular column category (column
total is 0), the chi-square statistic can not be calculated.
To proceed, the category must be either eliminated completely,
or combined with another category.
When rows or columns are combined (collapsed together)
to fix problems of small expected cell frequencies or
zero-sum categories, care should be taken to do the
collapsing such that the new hypothesis being
tested is still of interest. If the null hypothesis
of independence of row and column variables is true
for all categories of each variable, then combining
categories will preserve that property.
However, collapsing can destroy evidence of
non-independence, so a failure to reject the
null hypothesis for the collapsed table does
not rule out the possibility of non-independence
in the original table.
As with most statistical tests, the
power
of the chi-square test increases with a larger number
of observations. If there are too few observations,
it may be impossible to reject the null
hypothesis even if it is false.
The chi-square test and Fisher's exact test are designed for
observations cross-classified by two sets of nominal categories.
If either the row or column variable is actually continuous,
then the variable must be divided into intervals
to construct the contingency table. The interval
boundaries should be decided beforehand on the basis of
theory or custom. If the intervals are determined
by the particular data being analyzed, then the
test statistic and corresponding P value may not
be generalizable.
The chi-square test ignores any possible
ordering of either the row or column variables.
If either or both of the row or column variables
is ordinal or continuous, then an
alternative test to the
chi-square or Fisher's exact test may be preferable,
especially if one of the variables is
an outcome variable and the other an explanatory
variable. If the explanatory variable is
nominal and the outcome variable is continuous,
an analysis of variance [ANOVA]
is an alternative test. If the
explanatory variable is continuous
and the outcome variable is nominal,
then logistic regression
is an alternative test.
If both the explanatory and outcome
variables are continuous, then
simple linear regression
is an alternative test.
Fisher's exact test assumes that all the row marginal totals and
all the column marginal totals are fixed. However, work by
Tocher shows that
the test can be extended to the case where only one
set of marginal totals is fixed.
If a set of marginal totals is fixed, then the more nearly
those marginal totals equal each other, the more powerful the chi-square
test will be for the same total sample size.
For this reason, a study using a sampling scheme
in which the row or column
marginal totals are fixed
and can be set equal
(such as a retrospective or prospective study) will
tend to be more powerful than a
cross-sectional
study with the same total sample size.
If you are unsatisfied with your purchase, you may return it within 30
days for an
exchange, credit or refund.
This guarantee does not cover electronic download products, special requests requiring photocopying
or
engineering aids; however, if you cannot
edit our document(s) in your MS Word, Excel or Visio program we will fix
it or give you a refund.
Can't find what you're
looking for...?
Please call, Fax or Email Us at:
Office: (719) 649-4242
Fax: (719) 573-4205 Home Page
Click here to bookmark At-PQC™ then visit our
Toolbox to find a quality control plan that will
help you achieve an effective and efficient business
infrastructure that focuses on customer satisfaction,
continuous improvement and desirable cost savings. Visit
with us today for comprehensive assistance in developing
or choosing the right quality control plan for your
business.
Click here to visit our extensive selection of
quality control plans, policies, procedures and forms or
click here
for help with where-to-start.
We can interact with you anywhere in the USA from
8:00am to 5:00pm Monday through Friday except holidays.
At-PQC™
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916-5534
Office:
(719) 649-4242
Fax: (719) 573-4205
Email Us at:
Send an email to request next-day support or call our helpline at 719-649-4242
during your office hours
Mon - Fri except holidays.