Does your data violate normality test assumptions?

Click here for free Online Virus Check from Panda Antivirus

Google

Web This Site

Featured Article: Where to start...

Documents for quality control plans, internal audit plans, ISO 9001, ISO 9001:2000, management systems, mil-i-45208, QC manuals, quality control manuals, quality control systems, quality management systems, total quality management

Get Adobe
 
Construction
Manuals
Procedures
FAA
Forms
Kits
ISO
Gov

Helpful Links
 
Basics
Copyright
Link Directory
Link Exchange
Privacy
Resources
SPC Definitions
Stat Guide
Where-to-Start

Non-Tech Links
 
Cool Tool
Frustration
Inspiration
Opinion
Nonsense
Serious


Sponsored Links
 

 

Does your data violate normality test assumptions?


If the population from which data to be analyzed by a normality test were sampled violates one or more of the normality test assumptions, the results of the analysis may be incorrect or misleading. For example, if the assumption of mutual independence of the sampled values is violated, then the normality test results will not be reliable. If outliers are present, then the normality test may reject the null hypothesis even when the remainder of the data do in fact come from a normal distribution. Often, the effect of an assumption violation on the normality test result depends on the extent of the violation. Some small violations may have little practical effect on the analysis, while other violations may render the normality test result uselessly incorrect or uninterpretable.

Potential assumption violations include:


Implicit factors:
A lack of independence within a sample is often caused by the existence of an implicit factor in the data. For example, values collected over time may be serially correlated (here time is the implicit factor). If the data are in a particular order, consider the possibility of dependence. (If the row order of the data reflect the order in which the data were collected, an index plot of the data [data value plotted against row number] can reveal patterns in the plot that could suggest possible time effects.)

An implicit factor may also separate the data into different distributions, each of which may be normal, but which produce a nonnormal composite distribution. For example, measurements for females may follow a normal distribution, and measurements for males may also follow a normal distribution, but the measurements for the entire population of both males and females may not follow a normal distribution. Depending on the relative proportions of sampled data from each underlying normal distribution, and on the means and variances of each distribution, the composite mixture distribution may appear to be skewed, or to have nonnormal kurtosis, or both. Separating the data into different subsamples based on the value of the implicit factor may reveal that, conditional on the value of the implicit factor (e.g., gender), the data are sampled from a normal distribution, even if it is a different distribution for each value of the implicit factor.

Of course, an implicit factor may also separate the data into different nonnormal distributions. And if one or more of the subsamples has a small sample size, the test may fail to detect nonnormality due to a lack of power.

Outliers:
Values may not be identically distributed because of the presence of outliers. Outliers are anomalous values in the data. They may be due to recording errors, which may be correctable, or they may be due to the sample not being entirely from the same population. Apparent outliers may also be due to the values being from the same but nonnormal, population. The boxplot and normal probability plot (normal Q-Q plot) may suggest the presence of outliers in the data.

Patterns in plots of data:
The values in a sample may indeed be from the same population, but not from a normal one. Signs of nonnormality are skewness (lack of symmetry) or light-tailedness or heavy-tailedness.

The boxplot, histogram, and normal probability plot (normal Q-Q plot), along with the normality test, can provide information on the normality of the population distribution. However, if there are only a small number of data points, nonnormality can be hard to detect with any of these methods. If there are a great many data points, the normality test may detect statistically significant but trivial departures from normality that may be of no practical importance.

For data sampled from a normal distribution, normal probability plots should approximate straight lines, and boxplots should be symmetric (median and mean together, in the middle of the box) with few if any outliers.

Special problems with small sample sizes:
If the sample size is small, it may be difficult to detect assumption violations. Moreover, with small samples, nonnormality is difficult to detect even when it is present.

Even if none of the test assumptions are violated, a normality test with small sample sizes may not have sufficient power to detect a significant departure from normality, even if it is present. Power decreases as the significance level is decreased (i.e., as the test is made more stringent), and increases as the sample size increases. If a statistical significance test with small sample sizes produces a surprisingly non-significant P value, then a lack of power may be the reason. The best time to avoid such problems is in the design stage of an experiment, perhaps in consultation with a statistician, when appropriate minimum sample sizes can be determined before data collection begins.

Special problems with very large sample sizes:
For very large sample sizes, a hypothesis test may become so powerful that it detects departures from normality that are statistically significant but not of practical importance. With large sample sizes, small departures from normality will not compromise some statistical tests that assume normality (such as the t test, but not the F test for variances). If a normality test on a very large sample rejects normality, but the boxplot, histogram, and normal probability plot do not point to any clear signs of nonnormality (such as outliers or skewness), then the normality test may be detecting a departure from normality that has no practical importance.


Examine the glossary.

Back to StatGuide home page.

Get Adobe to read our PDF evaluation docs.
 

Satisfaction Guaranteed!

If you are unsatisfied with your purchase, you may return it within 30 days for an exchange, credit or refund. This guarantee does not cover electronic download products, special requests requiring photocopying or engineering aids; however, if you cannot edit our document(s) in your MS Word, Excel or Visio program we will fix it or give you a refund.

Can't find what you're looking for...?
Please call, Fax or Email Us at:

Office: (719) 649-4242
Fax: (719) 573-4205
Home Page

Click here to bookmark At-PQC™ then visit our Toolbox to find a quality control plan that will help you achieve an effective and efficient business infrastructure that focuses on customer satisfaction, continuous improvement and desirable cost savings. Visit with us today for comprehensive assistance in developing or choosing the right quality control plan for your business. Click here to visit our extensive selection of quality control plans, policies, procedures and forms or click here for help with where-to-start.

 

We can interact with you anywhere in the USA from 8:00am to 5:00pm Monday through Friday except holidays.

At-PQC™
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916-5534
Office: (719) 649-4242
Fax: (719) 573-4205
Email Us at:

Send an email to request next-day support or call our helpline at 719-649-4242 during your office hours Mon - Fri except holidays.

Click here to let us know how we're doing.

Get Adobe | About Us | Site Map | Contact Us | Privacy Policy
Policies | Procedures | FAA | Forms | Kits | ISO | Gov
Copyright © 1998-2005 JnF Specialties, LLC. All rights reserved.