If the X or Y
populations
from which
data to be analyzed by analysis of covariance (ANCOVA) were
sampled
violate one or more of
the ANCOVA assumptions, the results of the analysis may be
incorrect or misleading. For example, if the assumption of
independence
is violated, then analysis of covariance is
not appropriate.
If the assumption
of normality is violated,
or outliers are present,
then the analysis of covariance
may not be the most
informative analysis available, and this could mean the difference
between finding a significant difference between the
treatment (group) means or not.
A transformation
may result in a better fit.
Often, the impact of
an assumption violation on the ANCOVA result depends
on the extent of the violation (such as the how
inconstant the residual variance is, or how
skewed
the Y population
distribution
is).
Some small violations may have little practical effect
on the analysis, while other violations may render
the ANCOVA result uselessly incorrect or uninterpretable.
Apparent lack of independence
in the residuals may be caused by the existence of an implicit
variable in the data, a continuous X variable or a grouping variable
that was not explicitly used
in the ANCOVA model. In this case, the best model may be
more complicated than the one-way ANCOVA model. The
best model may not include the original X variable.
If there is a linear trend in the plot of the ANCOVA residuals
against the fitted values, then an implicit X variable may be
the cause.
A plot of the residuals against the prospective new X variable
should reveal whether there is a systematic variation; if
there is, you may consider adding the new X variable to
the ANCOVA model.
If an implicit X variable is not included in the fitted model,
the fitted estimates for the individual and common slopes may be
biased,
and not very meaningful, and the fitted Y values
may not be accurate.
Another possible cause of apparent dependence between the Y observations
is the presence of another implicit block
or treatment effect. (Such an effect can be considered another type of
implicit X variable, albeit a discrete one.)
If such a
variable is suspected, a
different model
may provide a better fit.
If multiple values of Y are collected at the same values of X
for a single group regression line,
this can act as another type of blocking, with the
unique values of X acting as blocks. These multiple
Y measurements may be less variable than the overall
variation in Y, and, given their common value of X,
they are not truly independent
of each other. If there
are many replicated X values, and if the variation
between Y at replicated values is much smaller than
the overall residual variance, then the variance
of the estimate of the slope may be too small.
This may make a test comparing slopes
anticonservative (more likely
than the stated significance level
to reject the null hypothesis,
even when it is true).
In this case, an alternative method is to replace
each such replicated X value (within the same regression line)
by a single data point with
the average Y value, and then perform the ANCOVA
analysis with the new data set. A possible drawback
to this method is that by reducing the number of
data points, the degrees of freedom associated with
the residual error is reduced, thus potentially
reducing the power of the test.
Whether the Y values are
independent
of each other is generally
determined by the structure of the experiment from which
they arise.
Y values collected over time may be serially
correlated
(here time is the implicit factor). If the data are in a
particular order, consider the possibility of dependence.
(If the row order of the data reflect the order in which
the data were collected, an index plot
of the data [data
value plotted against row number] can reveal patterns in
the plot that could suggest possible time effects.)
For serially correlated Y values, the estimates
of the slope and intercept will be
unbiased,
but the estimates of their variances will not
be reliable.
If you are unsure whether your Y values are independent,
you may wish to consult
a statistician or someone who is knowledgeable
about the data collection scheme you are using.
Values may not be identically distributed because of the
presence of outliers.
Outliers are anomalous values in the
data. Outliers may have a strong influence over
the fitted slope and intercept, giving a poor
fit to the bulk of the data points.
Outliers tend to increase the estimate of residual
variance,
lowering the chance of rejecting the
null hypothesis.
They may be due to recording errors, which may be
correctable, or they may be due to the Y values not all being
sampled from the same population. Apparent outliers
may also be due to the Y values being from the same, but
nonnormal,
population.
Outliers may show up clearly in a X-Y scatterplot of the
data for one of the regression lines, as points that do not lie
near the general linear trend of the
data for that regression line. A point may be
an unusual value in either X or Y without
necessarily being an outlier in the scatterplot.
Once the analysis of covariance model has been fitted,
the boxplot
and normal probability plot
(normal Q-Q plot) for residuals
may suggest the presence of outliers in the data.
After the fit, outliers are usually detected by
examining the residuals.
The method of least squares used in fitting the
analysis of covariance model involves minimizing the
sum of the squared vertical distances between each
data point and the fitted line. Because of this,
fitted lines can be highly sensitive to
outliers.
(In other words, least squares fitting is not
resistant
to outliers, and thus, neither is a fitted slope estimate.)
Outliers may affect the estimates for
the individual slopes and intercepts for
the regression lines, and could lead
to an incorrect conclusion about whether
the slopes are equal, or whether the
intercepts are equal.
If you find outliers in your data that
are not due to correctable errors, you may wish to consult
a statistician as to how to proceed.
The Y values for an analysis of variance
will not necessarily come from the same
normal population, although they should
all have the same variance.
For this reason, it may be difficult
to assess nonnormality of Y.
After the ANCOVA is performed, the residuals
can be examined for signs of nonnormality.
The residuals should all come from the
same normal distribution, with mean 0
and variance the same as the variance of the Y.
It may be the case that
the residuals are indeed from the same
population, but not from a normal one. Signs of
nonnormality
are
skewness
(lack of symmetry) or
light-tailedness or
heavy-tailedness.
The
boxplot,
histogram,
and normal probability plot
(normal Q-Q plot), along with the normality test,
can provide information on the normality of the
population distribution for the residuals. However, if there are only a small number
of data points, nonnormality can be hard to detect.
If there are a great many data points, the
normality test may detect statistically significant
but trivial departures from normality that will
have no real effect on the analysis of covariance.
If the residuals come from a normal distribution, normal
probability plots should approximate straight lines,
and boxplots should be symmetric (median and mean together,
in the middle of the box) with no
outliers.
If the number of data points is not too small,
the ANCOVA should not be much affected
by small departures from normality.
Just as the blocks and treatments
should not interact in a
one-way blocked ANOVA,
the concomitant (X) variable in the analysis of covariance
should not be affected by the treatments (levels of the grouping
variable). If both X and Y depend on the
group (treatment), then the analysis of covariance
can be misleading.
If the variance of the Y is not constant, then the
the error variance will not be constant.
The most common form of such
heteroscedasticity in Y is that
the variance of Y may increase as
the mean of Y increases, for data with positive X and Y.
Heteroscedasticity of Y is usually detected informally by
examining the X-Y scatterplot of the data
before performing the ANCOVA.
If both nonlinearity and unequal variances are present,
employing a transformation
of Y might
have the effect of simultaneously improving the linearity
and promoting equality of the variances.
The analysis of covariance requires that
the treatment regression lines have the
same slope. The test for equality of
treatment (group) means is the test for equality
of intercepts, and assumes that slopes
are equal.
Inequality of slopes can be ascertained informally by
examining the X-Y scatterplot of the data
before performing the analysis of covariance.
Otherwise, the test of equality of slopes provides
a formal test of whether the assumption of
parallel treatment regression lines has been violated.
If the linear model assumed by analysis of covariance
is not the correct one for the data,
then the slope estimates and the fitted values
from the ANCOVA will be
biased,
and not very meaningful. Over a restricted range of X or Y,
nonlinear models may be well approximated by linear models
(this is in fact the basis of linear interpolation),
but for accurate prediction a model appropriate to the
data should be selected. An
examination of the X-Y scatterplot
may reveal whether the linear model is appropriate.
If there is a great deal of variation in Y, it may
be difficult to decide what the appropriate model
is; in this case, the linear model may do as well
as any other, and has the virtue of simplicity.
The purpose of using the X variable in the analysis of covariance
is to use the information about X to reduce the variation
in Y and thus increase the chance of detecting differences
between the treatments. (This is similar to the purpose
of using a blocking variable
in an analysis of variance, except
that a blocking variable is discrete instead of continuous.)
Choosing an X variable that has no linear relation to Y
is pointless: no reduction in variance will be achieved,
and the power of the test
will be reduced. However, the effect will not generally be serious
unless the number of data points is small.
If there is no linear relation between X and Y, then
the analysis of covariance offers no improvement over
the one-way analysis of variance
in detecting differences between the group means.
The lack of a linear relation between X and Y can be detected informally by
examining the X-Y scatterplot of the data
before performing the ANCOVA.
Otherwise, the test of whether all the slopes are equal to 0
provides a formal test of whether there is a linear relation
between X and Y. Since this test assumes that all the
slopes are equal, it makes little sense if the test
for equality of slopes indicates that the slopes
are significantly different.
The analysis of covariance model assumes that the observed
X variables are fixed, not random. If the X values are
are not under the control of the experimenter (i.e., are
observed but not set), they may not be fixed.
If they have the same variance, the
estimates of the slope and intercept may be
biased.
If the assumption of parallel straight lines
for the different treatment groups is correct, the
plot of the observed Y values against X,
using different symbols for each group,
should suggest parallel linear bands across the graph with
no obvious departures from linearity.
Outliers
may appear as anomalous points in the graph,
often in the upper righthand or lower lefthand corner
of the graph. (A point may be an outlier
in either X or Y without necessarily
being far from the general trend of the data.)
If there is no linear relation between X and Y,
then the plot of Y vs X for each group will
have 0 slope: The bands will all be parallel
to the X axis.
If there is no relationship between X and treatment,
then a plot of Y vs X for each individual treatment
should look like the plot of Y vs X for
all treatments combined, except for random variation.
In particular, the range of X for each treatment
group should be similar.
A plot of
the X-Y data that uses a different symbol for each
treatment group can help you detect differences
in the distribution of Y along the X scale
for different groups.
If most of the X values for one treatment
tend to be larger than the X values for
another treatment, for example, then
you should investigate the possibility
that the value of X depends on the
treatment group.
If the ANCOVA model is not correct, the shape of the
general trend of the X-Y plot might suggest parallel
nonlinear curves. In this case, the shape of
the curves might suggest a
function to use (e.g., a polynomial, exponential, or
logistic function)
in a different model.
Alternatively, the plot might
suggest a reasonable transformation
to apply.
For example, if the X-Y plot arcs from lower left to upper right
so that data points either very low or very high in X lie
below the straight line suggested by the data,
while the data points with middling X values
lie on or above that straight line, taking square roots or logarithms
of the X values may promote linearity.
If the plot suggests that the different regression curves
are neither parallel nor linear, then the analysis of
covariance is not likely to be informative.
If the assumption of equal variances for the Y is correct, the
plot of the observed Y values against X for each group
should suggest a band across the graph with roughly
equal vertical width for all values of X.
(That is, the shape of the graph should suggest
a tilted cigar and not a wedge or a megaphone.)
A fan pattern like the profile of a megaphone, with a
noticeable flare either to the right or to the left
as shown in the picture suggests that
the variance in the values increases in the direction
the fan pattern widens (usually as the sample mean increases), and this in
turn suggests that a transformation
of the Y values might be useful.
If the number of data points is small, it may be difficult
to detect assumption violations. With small samples, violation assumptions
such as nonnormality
or heteroscedasticity of variances
are difficult to detect even when they are present. With
a small number of data points, analysis of covariance offers less protection
against violation of assumptions. With few data points, it
may be hard to determine how well the fitted ANCOVA model matches the
data, or whether a different model would be more appropriate.
Even if none of the test assumptions are violated, an analysis
of covariance on a small number of data points may not have sufficient
power to detect a significant
difference between the slope and 0, even if the slope
is non-zero. The power depends on the residual
error, the observed variation in X, the selected
significance (alpha-) level of the test,
and the number of data points. Power decreases as the residual
variance increases, decreases as the significance
level is decreased (i.e., as the test is made
more stringent), increases as the variation in observed X increases,
and increases as the number of data points
increases. If a statistical significance test with
a small number of data values produces a surprisingly non-significant
P value, then lack of power may be the reason.
The best time to avoid such problems is in the
design stage of an experiment, when appropriate
minimum sample sizes can be determined, perhaps in consultation
with a statistician, before data collection begins.
Product Guarantee
If you cannot open or edit our documents in your
Microsoft program we will fix it or refund your money -
guaranteed.
Can't find what you're
looking for...?
Please call, Fax or Email Us at:
Office: (719) 649-4242
Fax: (719) 573-4205 Home Page
Click here to bookmark At-PQC™ then visit our
Toolbox to find a quality control plan that will
help you achieve an effective and efficient business
infrastructure that focuses on customer satisfaction,
continuous improvement and desirable cost savings. Visit
with us today for comprehensive assistance in developing
or choosing the right quality control plan for your
business.
Click here to visit our extensive selection of
quality control plans, policies, procedures and forms or
click here
for help with where-to-start.
We can interact with you anywhere in the USA from
8:00am to 5:00pm Monday through Friday except holidays.
At-PQC™
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916-5534
Office:
(719) 649-4242
Fax: (719) 573-4205
Email Us at:
Send an email to request next-day support or call our helpline at 719-649-4242
during your office hours
Mon - Fri except holidays.