If the populations
from which data to be analyzed by a one-way analysis of variance (ANOVA) were
sampled violate one or more of the one-way ANOVA test assumptions, the results
of the analysis may be incorrect or misleading. For example, if the assumption
of independence
is violated, then the one-way ANOVA is simply not appropriate, although another
test (perhaps a blocked
one-way ANOVA) may be appropriate. If the assumption of normality
is violated, or outliers
are present, then the one-way ANOVA may not be the most powerful
test available, and this could mean the difference between detecting a true
difference among the population means or not. A nonparametric
test or employing a transformation
may result in a more powerful test. A potentially more damaging assumption
violation occurs when the population
variances are unequal, especially if the sample sizes are not approximately
equal (unbalanced).
Often, the effect of an assumption violation on the one-way ANOVA result depends
on the extent of the violation (such as how unequal the population variances
are, or how heavy-tailed
one or another population distribution
is). Some small violations may have little practical effect on the analysis,
while other violations may render the one-way ANOVA result uselessly incorrect
or uninterpretable. In particular, small
or unbalanced
sample sizes can increase vulnerability to assumption violations.
A lack of independence
within a sample is often caused by the existence of an implicit factor in the
data. For example, values collected over time may be serially correlated
(here time is the implicit factor). If the data are in a particular order,
consider the possibility of dependence. (If the row order of the data reflect
the order in which the data were collected, an index
plot of the data [data value plotted against row number] can reveal
patterns in the plot that could suggest possible time effects.)
Whether the samples are independent
of each other is generally determined by the structure of the experiment from
which they arise. Obviously correlated samples, such as a set of observations
over time on the same subjects, are not independent, and such data would be
more appropriately tested by a one-way blocked ANOVA or a repeated measures
ANOVA. If you are unsure whether your samples are independent, you may wish to
consult a statistician or someone who is knowledgeable about the data
collection scheme you are using.
Values may not be identically distributed because of the presence of outliers.
Outliers are anomalous values in the data. Outliers tend to increase the
estimate of sample variance, thus decreasing the calculated F statistic for
the ANOVA and lowering the chance of rejecting the null
hypothesis. They may be due to recording errors, which may be correctable,
or they may be due to the sample not being entirely from the same population.
Apparent outliers may also be due to the values being from the same, but nonnormal,
population. The boxplot
and normal
probability plot (normal Q-Q plot) may suggest the presence of outliers in
the data.
The F statistic is based on the sample means and the sample variances, each
of which is sensitive to outliers.
(In other words, neither the sample mean nor the sample variance is resistant
to outliers, and thus, neither is the F statistic.) In particular, a large
outlier can inflate the overall variance, decreasing the F statistic and thus
perhaps eliminating a significant difference. A nonparametric
test may be a more powerful test in such a situation. If you find outliers
in your data that are not due to correctable errors, you may wish to consult a
statistician as to how to proceed.
The values in a sample may indeed be from the same population, but not
from a normal one. Signs of nonnormality
are skewness
(lack of symmetry) or light-tailedness
or heavy-tailedness.
The boxplot,
histogram,
and normal
probability plot (normal Q-Q plot), along with the normality test, can
provide information on the normality of the population distribution. However,
if there are only a small number of data points, nonnormality can be hard to
detect. If there are a great many data points, the normality test may detect
statistically significant but trivial departures from normality that will have
no real effect on the F statistic.
For data sampled from a normal distribution, normal probability plots
should approximate straight lines, and boxplots should be symmetric (median
and mean together, in the middle of the box) with no outliers.
The one-way ANOVA's F test will not be much affected even if the population
distributions are skewed,
but the F test can be sensitive to population skewness if the sample sizes are
seriously unbalanced. If the sample sizes are not unbalanced, the F test will
not be seriously affected by light-tailedness
or heavy-tailedness,
unless the sample sizes are small (less than 5), or the departure from
normality is extreme (kurtosis less than -1 or greater than 2).
Robust
statistical tests operate well across a wide variety of distributions.
A test can be robust for validity, meaning that it provides P values close to
the true ones in the presence of (slight) departures from its assumptions. It
may also be robust for efficiency, meaning that it maintains its statistical
power
(the probability that a true violation of the null
hypothesis will be detected by the test) in the presence of those
departures. The one-way ANOVA's F test is robust for validity against
nonnormality, but it may not be the most powerful test available for a given
nonnormal
distribution, although it is the most powerful
test available when its test assumptions are met. In the case of nonnormality,
a nonparametric
test or employing a transformation
may result in a more powerful test.
The inequality of the population variances can be assessed by examination
of the relative size of the sample variances, either informally (including graphically),
or by a robust variance
test such as Levene's
test. (Bartlett's
test is even more sensitive to nonnormality than the one-way ANOVA's F
test, and thus should not be used for such testing.) The effect of inequality
of variances is mitigated when the sample sizes are equal: The F test is
fairly robust
against inequality of variances if the sample sizes are equal, although the
chance increases of incorrectly reporting a significant difference in the
means when none exists. This chance of incorrectly rejecting the null
hypothesis is greater when the population variances are very different from
each other, particularly if there is one sample variance very much larger than
the others.
The effect of inequality of the variances is most severe when the sample
sizes are unequal. If the larger samples are associated with the populations
with the larger variances, then the F statistic will tend to be smaller than
it should be, reducing the chance that the test will correctly identify a
significant difference between the means (i.e., making the test conservative).
On the other hand, if the smaller samples are associated with the populations
with the larger variances, then the F statistic will tend to be greater than
it should be, increasing the risk of incorrectly reporting a significant
difference in the means when none exists. This chance of incorrectly rejecting
the null hypothesis in the case of unbalanced sample sizes can be substantial
even when the population variances are not very different from each other.
Although the effect of unbalanced sample sizes and unequal population
variances increases for smaller sample sizes, it does not decrease
substantially if the sample sizes are increased without changing the lack of
balance in the sample sizes. For this reason, and because equal sample sizes
mitigate the effect of unequal population variances, the best course is to
keep the sample sizes as equal as possible.
If both nonnormality and unequal variances are present, employing a transformation
may be preferable. A nonparametric
test like the Kruskal-Wallis
test still assumes that the population variances are comparable.
The plot of each sample's values against its mean (or its sample ID) will
consist of vertical "stacks" of data points, one stack for each unique sample
mean value. If the assumptions for the samples' population distributions
are correct, the stacks should be about the same length. Outliers
may appear as anomalous points in the graph. A
fan pattern like the profile of a megaphone, with a noticeable flare either to
the right or to the left as shown in the picture (one or more of the "stacks"
of data points is much longer than the others), suggests that the variance in
the values increases in the direction the fan pattern widens (usually as the
sample mean increases), and this in turn suggests that a transformation
may be needed.
Side-by-side boxplots
of the samples can also reveal lack of homogeneity
of variances if some boxplots are much longer than others, and reveal
suspected outliers.
If one or more the sample sizes is small, it may be difficult to detect
assumption violations. With small samples, violation assumptions such as nonnormality
or inequality
of variances are difficult to detect even when they are present. Also,
with small sample size(s) the one-way ANOVA's F test offers less protection
against violation of assumptions.
Even if none of the test assumptions are violated, a one-way ANOVA with
small sample sizes may not have sufficient power
to detect any significant difference among the samples, even if the means are
in fact different. The power depends on the error variance, the selected
significance (alpha-) level of the test, and the sample size. Power decreases
as the variance increases, decreases as the significance level is decreased
(i.e., as the test is made more stringent), and increases as the sample size
increases. With very small samples, even samples from populations with very
different means may not produce a significant one-way ANOVA F test statistic
unless the sample variance is small. If a statistical significance test with
small sample sizes produces a surprisingly non-significant P
value, then a lack of power may be the reason. The best time to avoid such
problems is in the design stage of an experiment, when appropriate minimum
sample sizes can be determined, perhaps in consultation with a statistician,
before data collection begins.
The one-way ANOVA test is not too sensitive to inequality
of variances if the sample sizes are equal. If the sample sizes are not
approximately equal, and especially if the larger sample variances are
associated with the smaller sample sizes, then the calculated F statistic may
be dominated by the sample variances for the larger samples, so that the test
is less likely to correctly identify significant differences in the means if
the larger samples are associated with the larger population variances, and
more likely to report nonexistent differences in the means if the smaller
samples are associated with the larger population variances. Unbalanced sample
sizes also increase any effect due to nonnormality, and require adjustments to
be made in calculating multiple
comparisons tests.
In general, the multiple comparisons tests will be robust in those
situations when the one-way ANOVA's F test is robust, and will be subject to
the same potential problems with unequal variances, particularly when the
sample sizes are unequal. As with the one-way ANOVA itself, the best
protection against the effects of possible assumption violations is to employ
equal sample sizes. Unequal variances may make individual comparisons of means
inaccurate, because the multiple comparison techniques rely on a pooled
estimate for the variance, based on the assumption that the sample variances
are equal.
Ideally, the sample sizes will be equal for all-pairwise multiple
comparison tests. When they are not, an adjustment must be made to the
calculations. The Tukey-Kramer adjustment (based on the harmonic mean of each
pair's sample sizes), which uses, may be conservative (that is, it may
be less likely to flag means as different than the nominal significance level
would suggest), but in general performs well. An alternative procedure is to
use the harmonic mean of all the sample sizes for all the pairwise
comparisons. This has the disadvantage that the actual significance level of
the test is more often different from the nominal significance level than is
the case with the Tukey-Kramer adjustment; worse, the actual significance
level of the test may be greater than the nominal significance level, meaning
that the test is more likely to incorrectly flag a mean difference as
significant.
If you are unsatisfied with your purchase, you may return it within 30
days for an
exchange, credit or refund.
This guarantee does not cover electronic download products, special requests requiring photocopying
or
engineering aids; however, if you cannot
edit our document(s) in your MS Word, Excel or Visio program we will fix
it or give you a refund.
Can't find what you're
looking for...?
Please call, Fax or Email Us at:
Office: (719) 649-4242
Fax: (719) 573-4205 Home Page
Click here to bookmark At-PQC™ then visit our
Toolbox to find a quality control plan that will
help you achieve an effective and efficient business
infrastructure that focuses on customer satisfaction,
continuous improvement and desirable cost savings. Visit
with us today for comprehensive assistance in developing
or choosing the right quality control plan for your
business.
Click here to visit our extensive selection of
quality control plans, policies, procedures and forms or
click here
for help with where-to-start.
We can interact with you anywhere in the USA from
8:00am to 5:00pm Monday through Friday except holidays.
At-PQC™
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916-5534
Office:
(719) 649-4242
Fax: (719) 573-4205
Email Us at:
Send an email to request next-day support or call our helpline at 719-649-4242
during your office hours
Mon - Fri except holidays.