The exact assumptions and null hypothesis
for the Pearson chi-square test for independence
depend on the sampling scheme used, although the calculated statistic
is the same in each case. There are three possible sample
schemes for the values in a contingency table with R rows
and C columns:
Sampling Scheme 1:
The total number of data values in the contingency table (N)
is fixed, but none of the row or column totals are fixed.
This sampling scheme is known as cross-sectional, naturalistic,
or multinomial sampling. In this case, the assumptions are:
The data observations are made on a
random sample
of N objects, cross-classified according to two attributes,
the row variable and the column variable.
The event of an observation being in a particular row is
independent
of that same observation being in a particular column.
Sampling Scheme 2:The total number of data values in the contingency table (N)
is fixed, and either the row marginal totals or the column marginal totals are fixed.
If one of the attributes is viewed as an outcome variable and the other as
an explanatory variable (e.g., if one variable is the occupation of
the parent and the other is the occupation of the child), then the study
is retrospective or a case-control study if
the marginal totals are fixed for the outcome variable,
and the study is prospective if the marginal
total are fixed for the explanatory variable.
If the r row marginal totals are fixed such that row i
has n[i] observations in it, the assumptions are:
The data observations are made on rrandom samples,
with n[i] values in the ith sample.
Sample i is taken from objects that have the
ith value of the row attribute.
For any given row, the probability of an observation from that row being
in a particular column is the same for all columns.
Sampling Scheme 3:The total number of data values in the contingency table (N)
is fixed, and both the row marginal totals are the column marginal totals are fixed.
This is also the sampling scheme assumed by Fisher's exact test.
If the row marginal totals and the column marginal totals are fixed,
the assumptions are:
Each object is classified into one and only one category of
the row variable, and into one and only one category of the column variable.
The N observations come from a
random sample
such that
each observation has the same probability of being classified
into the ith row and the jth column as any other observation.
The event of an observation being in a particular row is
independent
of that same observation being in a particular column.
The Pearson chi-square test involves using the chi-square
distribution
to approximate the underlying exact distribution. Although the
chi-square approximation can be used in all three sampling
schemes, the approximation becomes less good when marginal
totals are fixed. The best approximation will be most likely
be in the first (multinomial) sampling scheme.
The approximation becomes better as the
expected cell
frequencies grow larger, and may be inappropriate
for contingency tables with very small expected cell frequencies.
In case of a 2x2 contingency table, an adjusted value of the
chi-square statistic (the Yates corrected chi-square)
is often used to correct for a continuous distribution (chi-square)
being used to approximate the very discrete distribution
of the values in the 2x2 table. The purpose of the correction
is to produce P values that are closer to those that would
be calculated by the exact (Fisher) test.
The Pearson, likelihood-ratio (deviance), and randomization chi-square
tests all approximate the same chi-square distribution
asymptotically (as the total sample size gets large).
The Pearson chi-square test is always more
conservative
than the randomization chi-square test, and tends to be
more conservative than the likelihood-ratio chi-square test.
For a 2x2 table, the Pearson chi-square test tends to be
more conservative than the exact (Fisher's) test, and
the likelihood-ratio chi-square tends to be
less conservative than the exact test (and thus more
likely to erroneously reject the null hypothesis).
Fisher's exact test assumes that
the total number of data values in the 2x2 contingency table (N)
is fixed, and both the row marginal totals and the column marginal totals are fixed.
If the 2 row marginal totals are fixed
and the 2 column marginal totals are fixed, the assumptions
for Fisher's exact test are:
Each object is classified into one and only one category of
the row variable, and into one and only one category of the column variable.
The N observations come from a
random sample
such that each observation has the same probability of being classified
into the ith row and the jth column as any other observation.
The event of an observation being in a particular row is
independent
of that same observation being in a particular column.
Amongmeasures of association
for two-way contingency tables,
Kendall's Tau B, Tau C, Spearman's rho, and Gamma assume that both the row
and column variables have ordered categories (such as disease severity categories).
Cross-classification schemes for two-way contingency tables
work best when the categories for both variables are discrete
(e.g., gender). When a continuous variable such as age is divided into
intervals to form the categories of a variable, the interval
boundaries should be decided beforehand on the basis of
theory or custom. The intervals should not be determined
by the particular data being analyzed.
Guidance:
Ways to detect before performing a
contingency table analysis whether your data violate any assumptions.
Ways to examine contingency table analysis results to detect
assumption violations.
Possible alternatives if your data or contingency table analysis
results indicate assumption violations.
To properly analyze and interpret
results of the contingency table analysis,
you should be familiar with the following terms and
concepts:
Failure to understand and properly apply
contingency table analysis may result in drawing erroneous conclusions from your data.
If you are not familiar with these terms and concepts, you may wish to
consult with a statistician.
You may also want to consult the following references:
Agresti, A. 1990.
Categorical Data Analysis. New York: John Wiley & Sons.
Agresti, A. 1996.
An Introduction to Categorical Data Analysis. New York: John Wiley & Sons.
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 1975.
Discrete Multivariate Analysis. Cambridge, MA: MIT Press.
Brownlee, K. A. 1965. Statistical Theory and Methodology
in Science and Engineering. New York: John Wiley & Sons.
Conover, W. J. 1980. Practical Nonparametric Statistics. 2nd ed.
New York: John Wiley & Sons.
Daniel, Wayne W. 1978. Applied Nonparametric Statistics.
Boston: Houghton Mifflin.
Daniel, Wayne W. 1995. Biostatistics. 6th ed.
New York: John Wiley & Sons.
Everitt, B. S. 1992. The Analysis of Contingency Tables. 2nd ed.
London: Chapman & Hall.
Koehler, K. and Larntz K. 1980.
An empirical investigation of goodness-of-fit statistics for sparse multinomials.
Journal of the American Statistical Association75: 336-344.
Lehmann, E. L. 1975. Nonparametrics: Statistical Methods Based on
Ranks. San Francisco: Holden-Day.
Satisfaction Guaranteed
If you cannot edit At-PQC™ document(s) with your MS Office, OpenOffice or compatible cloud software program, we will fix it or refund your purchase.
Can't find what you're
looking for...?
Please call, Fax or Email Us at:
Click here to bookmark At-PQC™ then visit our
Toolbox to find a quality control plan that will
help you achieve an effective and efficient business
infrastructure that focuses on customer satisfaction,
continuous improvement and desirable cost savings. Visit
with us today for comprehensive assistance in developing
or choosing the right quality control plan for your
business.
Click here to visit our extensive selection of
quality control plans, policies, procedures and forms or
click here
for help with where-to-start.
We can interact with you anywhere in the USA from 6:00am to 6:00pm Monday through Friday except holidays.
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916
Cellphone Support 6:00am to 6:00pm.
Email Us at:
Send an email to request support or call our helpline during your office hours.