Does your data violate F test assumptions?

Click here for free Online Virus Check from Panda Antivirus

Google

Web This Site

Featured Article: Where to start...

Documents for quality control plans, internal audit plans, ISO 9001, ISO 9001:2000, management systems, mil-i-45208, QC manuals, quality control manuals, quality control systems, quality management systems, total quality management

Get Adobe
 
Construction
Manuals
Procedures
FAA
Forms
Kits
ISO
Gov

Helpful Links
 
Basics
Copyright
Link Directory
Link Exchange
Privacy
Resources
SPC Definitions
Stat Guide
Where-to-Start

Non-Tech Links
 
Cool Tool
Frustration
Inspiration
Opinion
Nonsense
Serious


Sponsored Links
 

 

Does your data violate F test assumptions?


If the populations from which data to be analyzed by a F test were sampled violate one or more of the F test assumptions, the results of the analysis may be incorrect or misleading. For example, if the assumption of independence is violated, then the F test is simply not appropriate, although another test (perhaps a chi-square test for variance) on the paired differences might be appropriate. If the assumption of normality is violated, or outliers are present, then the F test may not be the most powerful test available, and this could mean the difference between detecting a true difference or not. A nonparametric test or more robust test may result in a more powerful test.

Often, the effect of an assumption violation on the F test result depends on the extent of the violation such as how skewed or heavy-tailed one or the other population distribution is). Some very small violations may have little practical effect on the analysis, while other violations may render the F test result uselessly incorrect or uninterpretable. In particular, small sample sizes can increase vulnerability to assumption violations.

The bad news is that the F test is strongly affected and often rendered invalid by violation of the normality assumption. In fact, if your reason for performing an F test is to judge whether or not the assumption of equal variances is valid for a two-sample unpaired t test, then the t test is usually much less affected by nonnormality than the F test is, and you may be best off simply using a Welch-Satterthwaite t test or transforming the data to be analyzed by the t test if you have reason to suspect that the sample variances are not equal. The good news is that the "other" F tests, the ones calculated for analysis of variance are F tests for location instead of F tests for dispersion, and, like the t test, are reasonably robust to nonnormality if the sample sizes are not too small.

Potential assumption violations include:


Implicit factors:
A lack of independence within a sample is often caused by the existence of an implicit factor in the data. For example, values collected over time may be serially correlated (here time is the implicit factor). If the data are in a particular order, consider the possibility of dependence. (If the row order of the data reflect the order in which the data were collected, an index plot of the data [data value plotted against row number] can reveal patterns in the plot that could suggest possible time effects.)

Lack of independence:
Whether the two samples are independent of each other is generally determined by the structure of the experiment from which they arise. Obviously correlated samples, such as a set of pre- and post-test observations on the same subjects, are not independent, and such data would be more appropriately tested by a one-sample test on the paired differences. If you are unsure whether your samples are independent, you wish may to consult a statistician or someone who is knowledgeable about the data collection scheme you are using.

Outliers:
Values may not be identically distributed because of the presence of outliers. Outliers are anomalous values in the data. Outliers tend to increase the estimate of a sample variance. This can make the F statistic, which is a ratio of the two sample variances, very different from what it would be without the outlier(s), and thus render the F test meaningless. In particular, one or more outliers in a single sample will tend to make the F statistic too large, thus increasing the chance of incorrectly concluding that the population variances differ.

Outliers may be due to recording errors, which may be correctable, or they may be due to the sample not being entirely from the same population. Apparent outliers may also be due to the values being from the same, but nonnormal, population. The boxplot and normal probability plot (normal Q-Q plot) may suggest the presence of outliers in the data.

The F statistic is based on the the sample variances, both of which are sensitive to outliers. (In other words, the sample variance is not resistant to outliers, and thus, neither is the F statistic.) A nonparametric test may be a more powerful test in such a situation. If you find outliers in your data that are not due to correctable errors, you may wish to consult a statistician as to how to proceed.

Nonnormality:
The values in a sample may indeed be from the same population, but not from a normal one. Signs of nonnormality are skewness (lack of symmetry) or light-tailedness or heavy-tailedness. The boxplot, histogram, and normal probability plot (normal Q-Q plot), along with the normality test, can provide information on the normality of the population distribution. However, if there are only a small number of data points, nonnormality can be hard to detect. If there are a great many data points, the normality test may detect statistically significant but trivial departures from normality that will have no real effect on the F statistic, although the F test is more sensitive to even small departures from normality than, say, the t test.

For data sampled from a normal distribution, normal probability plots should approximate straight lines, and boxplots should be symmetric (median and mean together, in the middle of the box) with no outliers.

Any departures from normality can render the results of the F test invalid, although the worst effects come when the distributions are either heavy-tailed or light-tailed, rather than when the distributions are simply skewed. For data from distributions that are heavy-tailed, the reported P value is much smaller than the actual significance level, meaning that the F test is much more likely to incorretly reject the null hypothesis of equal variances even if it is true. Conversely, for data from distributions that are light-tailed, such as the uniform distribution, the reported P value is much larger than the actual significance level, meaning that the F test is much less likely to detect a real difference between the population variances.

Robust statistical tests operate well across a wide variety of distributions. The F test for comparing two variances is not a robust test against nonnormality, although it is the most powerful test available when its test assumptions are met. In the case of nonnormality, a nonparametric test may result in a more powerful test.

Special problems with small sample sizes:
If one or both of the sample sizes is small, it may be difficult to detect assumption violations. With small samples, nonnormality is difficult to detect. Also, with small sample size(s) the individual sample variances that make up the F statistic are themselves less reliable.

Even if none of the test assumptions are violated, an F test with small sample sizes may not have sufficient power to detect a significant difference between the two samples, even if the variances are in fact different.

If a statistical significance test with small sample sizes produces a surprisingly non-significant P value, then lack of power may be the reason. The best time to avoid such problems is in the design stage of an experiment, when appropriate minimum sample sizes can be determined, perhaps in consultation with a statistician, before data collection begins.


Examine the glossary.

Back to StatGuide home page.

Get Adobe to read our PDF evaluation docs.
 

Satisfaction Guaranteed!

If you are unsatisfied with your purchase, you may return it within 30 days for an exchange, credit or refund. This guarantee does not cover electronic download products, special requests requiring photocopying or engineering aids; however, if you cannot edit our document(s) in your MS Word, Excel or Visio program we will fix it or give you a refund.

Can't find what you're looking for...?
Please call, Fax or Email Us at:

Office: (719) 649-4242
Fax: (719) 573-4205
Home Page

Click here to bookmark At-PQC™ then visit our Toolbox to find a quality control plan that will help you achieve an effective and efficient business infrastructure that focuses on customer satisfaction, continuous improvement and desirable cost savings. Visit with us today for comprehensive assistance in developing or choosing the right quality control plan for your business. Click here to visit our extensive selection of quality control plans, policies, procedures and forms or click here for help with where-to-start.

 

We can interact with you anywhere in the USA from 8:00am to 5:00pm Monday through Friday except holidays.

At-PQC™
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916-5534
Office: (719) 649-4242
Fax: (719) 573-4205
Email Us at:

Send an email to request next-day support or call our helpline at 719-649-4242 during your office hours Mon - Fri except holidays.

Click here to let us know how we're doing.

Get Adobe | About Us | Site Map | Contact Us | Privacy Policy
Policies | Procedures | FAA | Forms | Kits | ISO | Gov
Copyright © 1998-2005 JnF Specialties, LLC. All rights reserved.