Does your data violate normality test assumptions?
If the
population
from which
data to be analyzed by a normality test were sampled
violates one or more of
the normality test assumptions, the results of the analysis may be
incorrect or misleading. For example, if the assumption of mutual
independence of the sampled values
is violated, then the normality test results will not be reliable.
If outliers are present,
then the normality test may reject the
null hypothesis
even when the remainder of the data do in fact
come from a normal distribution.
Often, the effect of
an assumption violation on the normality test result depends
on the extent of the violation.
Some small violations may have little practical effect
on the analysis, while other violations may render
the normality test result uselessly incorrect or uninterpretable.
A lack of independence
within a sample is often caused by
the existence of an implicit factor in the data. For example,
values collected over time may be serially
correlated
(here time is the implicit factor). If the data are in a
particular order, consider the possibility of dependence.
(If the row order of the data reflect the order in which
the data were collected, an
index plot of the data [data
value plotted against row number] can reveal patterns in
the plot that could suggest possible time effects.)
An implicit factor may also separate the data into different
distributions, each of which may be normal, but which produce
a nonnormal composite distribution. For example, measurements
for females may follow a normal distribution, and measurements
for males may also follow a normal distribution, but the measurements
for the entire population of both males and females may not
follow a normal distribution. Depending on the relative
proportions of sampled data from each underlying normal distribution,
and on the means and variances of each distribution, the
composite mixture
distribution may appear to be
skewed,
or to have nonnormal kurtosis,
or both. Separating the data into different subsamples
based on the value of the implicit factor may reveal
that, conditional on the value of the implicit factor
(e.g., gender), the data are sampled from a normal distribution,
even if it is a different distribution for each value of
the implicit factor.
Of course, an implicit factor may also separate the data
into different nonnormal distributions. And
if one or more of the subsamples has a
small sample size,
the test may fail to detect nonnormality due to a lack
of power.
Values may not be identically distributed because of the
presence of outliers.
Outliers are anomalous values in the data.
They may be due to recording errors, which may be
correctable, or they may be due to the sample not being
entirely from the same population. Apparent outliers
may also be due to the values being from the same but
nonnormal,
population.
The boxplot
and normal probability plot
(normal Q-Q plot) may suggest the presence of outliers in the data.
The
boxplot,
histogram,
and normal probability plot
(normal Q-Q plot), along with the normality test,
can provide information on the normality of the
population distribution. However, if there are only a small number
of data points, nonnormality can be hard to detect with any
of these methods.
If there are a great many data points, the
normality test may detect statistically significant
but trivial departures from normality that may
be of no practical importance.
For data sampled from a normal distribution, normal
probability plots should approximate straight lines,
and boxplots should be symmetric (median and mean together,
in the middle of the box) with few if any
outliers.
If the sample size is small, it may be difficult
to detect assumption violations. Moreover, with small samples,
nonnormality
is difficult to detect even when it is present.
Even if none of the test
assumptions are violated, a normality test with small sample
sizes may not have sufficient
power
to detect a significant
departure from normality, even if it is present.
Power decreases as the significance
level is decreased (i.e., as the test is made
more stringent), and increases as the sample size
increases.
If a statistical significance test with small sample sizes
produces a surprisingly non-significant
P value, then a lack of power may be the reason.
The best time to avoid such problems is in the
design stage of an experiment, perhaps in consultation
with a statistician, when appropriate
minimum sample sizes can be determined before data collection begins.
For very large sample sizes, a hypothesis test may become so
powerful that it detects
departures from normality that are statistically significant
but not of practical importance. With large sample sizes, small
departures from normality will not compromise some statistical
tests that assume normality (such as the t test, but not the F test
for variances). If a normality test on a very large sample rejects normality,
but the boxplot,
histogram,
and normal probability plot
do not point to any clear signs of nonnormality (such as
outliers
or skewness),
then the normality test may be detecting
a departure from normality that has no practical importance.
If you are unsatisfied with your purchase, you may return it within 30
days for an
exchange, credit or refund.
This guarantee does not cover electronic download products, special requests requiring photocopying
or
engineering aids; however, if you cannot
edit our document(s) in your MS Word, Excel or Visio program we will fix
it or give you a refund.
Can't find what you're
looking for...?
Please call, Fax or Email Us at:
Office: (719) 649-4242
Fax: (719) 573-4205 Home Page
Click here to bookmark At-PQC™ then visit our
Toolbox to find a quality control plan that will
help you achieve an effective and efficient business
infrastructure that focuses on customer satisfaction,
continuous improvement and desirable cost savings. Visit
with us today for comprehensive assistance in developing
or choosing the right quality control plan for your
business.
Click here to visit our extensive selection of
quality control plans, policies, procedures and forms or
click here
for help with where-to-start.
We can interact with you anywhere in the USA from
8:00am to 5:00pm Monday through Friday except holidays.
At-PQC™
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916-5534
Office:
(719) 649-4242
Fax: (719) 573-4205
Email Us at:
Send an email to request next-day support or call our helpline at 719-649-4242
during your office hours
Mon - Fri except holidays.