Does your data violate one-way blocked ANOVA assumptions?

If the populations from which data to be analyzed by a one-way blocked analysis of variance (ANOVA) were sampled violate one or more of the one-way blocked ANOVA test assumptions, the results of the analysis may be incorrect or misleading. For example, if the assumption of lack of interaction between blocks and treatments is violated, then the one-way blocked ANOVA may simply not be appropriate, although another test with more data (perhaps a two-way ANOVA with replications, to allow for the use of an interaction term in the model) may be appropriate. Alternatively, there may be a transformation that will eliminate the interaction between blocks and treatments.

If the assumption of normality is violated, or outliers are present, then the one-way blocked ANOVA may not be the most powerful test available, and this could mean the difference between detecting a true difference among the population (treatment) means or not. A nonparametric test or employing a transformation may result in a more powerful test.

Note that the values that make up each block need not be independent, and in fact are expected to be correlated, such as measurements on the same subject over time (subject is a block), or measurements on littermates (litter is a block). If you treat blocked data as coming from independent samples, such as doing one-way ANOVA without using the blocking factor, then you may sacrifice power.

A potentially more damaging assumption violation occurs when the population (treatment) variances are unequal. Often, the effect of an assumption violation on the one-way blocked ANOVA result depends on the extent of the violation (such as how unequal the population variances are, or how heavy-tailed one or another population distribution is). Some small violations may have little practical effect on the analysis, while other violations may render the one-way blocked ANOVA result uselessly incorrect or uninterpretable. In particular, small or unbalanced sample sizes can increase vulnerability to assumption violations.

Potential assumption violations include:

• Implicit factors:
• A lack of independence within a sample (treatment) is often caused by the existence of an implicit factor in the data (other than the blocking factor itself). For example, values collected over time may be serially correlated (here time is the implicit factor). If the data are in a particular order, consider the possibility of dependence. For a one-way blocked ANOVA, the analysis is not seriously affected if there is serial correlation for measurements within a block (across treatments), as long as there is no interest in testing differences between the blocks. In fact, if the blocks are random, then there generally is a correlation for measurements within a block, although it is assumed to be constant. However, if there is serial correlation for measurements within a treatment (across blocks), then the reported F test is misleading. If the serial correlation is positive, then the actual P value is much greater than the reported P value, making it possible to falsely conclude that there are significant differences when none exist. Conversely, if the serial correlation is negative, then the actual P value is much smaller than the reported P value, making it possible to falsely conclude that there are no significant differences when the treatment means are in fact significantly different. If the row order of the data reflect the order in which the data were collected, an index plot of the data [data value plotted against row number] can reveal patterns in the plot that could suggest possible time effects. Note that because data for a one-way blocked ANOVA is often arranged systematically with respect to the blocking factor as well as the treatment factor, it may be difficult to detect an implicit factor in the original data this way. Examining the residuals, from which the linear effects of both the treatment factor and the blocking factor have been removed, may be an easier way to uncover an implicit factor.
• Outliers:
• Values may not be identically distributed because of the presence of outliers. Outliers are anomalous values in the data. Outliers tend to increase the estimate of sample variance, thus decreasing the calculated F statistic for the ANOVA and lowering the chance of rejecting the null hypothesis. They may be due to recording errors, which may be correctable, or they may be due to the sample not being entirely from the same population. Apparent outliers may also be due to the values being from the same, but nonnormal, population. The boxplot and normal probability plot (normal Q-Q plot) may suggest the presence of outliers in the data. The F statistic is based on the sample means and the sample variances, each of which is sensitive to outliers. (In other words, neither the sample mean nor the sample variance is resistant to outliers, and thus, neither is the F statistic.) In particular, a large outlier can inflate the overall variance, decreasing the F statistic and thus perhaps eliminating a significant difference. A nonparametric test may be a more powerful test in such a situation. If you find outliers in your data that are not due to correctable errors, you may wish to consult a statistician as to how to proceed.
• Nonnormality:
• The values in a sample may indeed be from the same population, but not from a normal one. Signs of normality are skewness (lack of symmetry) or light-tailedness or heavy-tailedness. The boxplot, histogram, and normal probability plot (normal Q-Q plot), along with the normality test, can provide information on the normality the population distribution. However, if there are only a small number of data points, nonnormality can be hard to detect. If there are a great many data points, the normality test may detect statistically significant but trivial departures from normality that will have no real effect on the F statistic. For data sampled from a normal distribution, normal probability plots should approximate straight lines, and boxplots should be symmetric (median and mean together, in the middle of the box) with no outliers. Because the sample sizes for a one-way blocked ANOVA are forced to be balanced, the ANOVA's F test will not be much affected even if the population distributions are skewed, For the same reason, the F test will not be seriously affected by light-tailedness or heavy-tailedness, unless the sample sizes are small (less than 5), or the departure from normality is extreme (kurtosis less than -1 or greater than 2), especially if the samples are extremely nonnormal in different ways (e.g., one very skewed to the right while another is very skewed to the left). Robust statistical tests operate well across a wide variety of distributions. A test can be robust for validity, meaning that it provides P values close to the true ones in the presence of (slight) departures from its assumptions. It may also be robust for efficiency, meaning that it maintains its statistical power (the probability that a true violation the null hypothesis will be detected by the test) in the presence of those departures. The one-way blocked VA's F test is robust for validity against nonnormality, but it may not be the most powerful test available for a given nonnormal distribution, although it is the most powerful test available when its test assumptions are met. In the case of nonnormality, a nonparametric test or employing a transformation may result in a more powerful test.
• Unequal population (treatment) or block variances:
• The inequality of the population (treatment) variances can be assessed by examination of the relative size of the sample variances, either informally (including graphically), or by a robust variance test such as Levene's test. (Bartlett's test is even more sensitive to nonnormality than the ANOVA's F test, and thus hould not be used for such testing.) The effect of inequality of variances is mitigated when the sample sizes are equal, so the F test for the one-way blocked ANOVA (where all the sample sizes are forced to be the same) is fairly robust against inequality of either treatment or block variances. If the treatment variances are unequal, then the chance increases slightly of incorrectly reporting a significant difference in the means when none exists. This chance of incorrectly rejecting the null hypothesis is greater when the population variances are very different from each other, particularly if there is one sample variance very much larger than the others. Conversely, if the block variances are unequal, the chance increases slightly of failing to find a significant difference between the treatment means when they are in fact different, meaning that the ANOVA F test becomes less powerful. If both nonnormality and unequal variances are present, employing a transformation may be appropriate. A nonparametric test like the Friedman test still assumes that the population variances are comparable. The inequality of block variances can also be judged by examining the relative size of the variances of the data, categorized by blocks instead of by populations (treatments).
• Interaction between blocks and treatments:
• The one-way blocked ANOVA F test assumes that the size of the change from one treatment to another is not dependent on the identity of the block. In particular, the size of the change is assumed to be independent of the size of the measurement value in the block. For example, if a one-way blocked test performed comparing blood pressure on patients at various times after a drug treatment, then the average change produced by the drug between time 1 and time 2 should be the same for those who start with high blood pressure as those who start with normal blood pressure. If this is not the case, then the simple additive model assumed by the one-way blocked ANOVA test is incorrect, and there is interaction between blocks (patients) and the treatment groups (time). The plot of residuals against fitted values may help detect such interaction. The plot of observed values against sample (treatment) number may be even more useful in detecting interaction. If there is no interaction, the line segments (one for each pair) should be parallel or nearly so. If there is interaction between blocks and treatments, the F statistic for the test of the treatment factor will tend to decrease, thus making it less likely that a significant difference will be detected. Thus, if the F test produces a significant result when there is interaction, the F test would have been significant without interaction. This means that if the test for treatment differences is significant, you need not worry too much about the effect of interactions on the test results. However, if the test is not significant, there is the chance that the lack of significance was due to interaction between blocks and treatments, despite a genuine difference between the treatments.
• Patterns in plot of data:
• The plot of each sample's values against its mean (or its sample ID) will consist of vertical "stacks" of data points, one stack for each unique sample mean value. If the assumptions for the samples' population distributions are correct, the stacks should be about the same length. Outliers may appear as anomalous points in the graph.
A fan pattern like the profile of a megaphone, with a noticeable flare either to the right or to the left as shown in the picture (one or more of the "stacks" of data points is much longer than the others), suggests that the variance in the values increases in the direction the fan pattern widens (usually as the sample mean increases), and this in turn suggests that a transformation may be needed. Side-by-side boxplots of the samples can also reveal lack of homogeneity of variances if some boxplots are much longer than others, and reveal suspected outliers.
The plot of each sample's values against its block number will also consist of vertical "stacks" of data points, one stack for each block. If the assumptions for the samples' population distributions are correct, the stacks should also be about the same length. Again, outliers may appear as anomalous points in the graph. Interactions between blocks and treatments may create the appearance of unequal variances between the treatments.
• Special problems with small sample sizes:
• If one or more the sample sizes is small, it may be difficult to detect assumption violations. With small samples, violation assumptions such as nonnormality or inequality of variances are difficult to detect even when they are present. Also, with small sample size(s) the one-way blocked ANOVA's F test offers less protection against violation of assumptions. Even if none of the test assumptions are violated, a one-way blocked ANOVA with small sample sizes may not have sufficient power to detect any significant difference among the samples, even if the means are in fact different. The power depends on the error variance, the selected significance (alpha-) level of the test, and the sample size. Power decreases as the variance increases, decreases as the significance level is decreased (i.e., as the test is made more stringent), and increases as the sample size increases. With very small samples, even samples from populations with very different means may not produce a significant one-way blocked ANOVA F test statistic unless the sample variance is small. If a statistical significance test with small sample sizes produces a surprisingly non-significant P value, then a lack of power may be the reason. The best time to avoid such problems is in the design stage of an experiment, when appropriate minimum sample sizes can be determined, perhaps in consultation with a statistician, before data collection begins.
• Special problems with unbalanced sample sizes:
• Because the one-way blocked ANOVA test has only one observation per cell (unique combination of treatment factor level and block factor level), unequal sample sizes produce empty cells. To allow the usual calculations to continue, either only blocks with a full set of ues for each treatment can be included in the analysis, or values must be imputed to fill in the empty cells. Neither method is as desirable as having complete data, so every effort should be made to avoid missing values.

For one-way blocked ANOVA, uses the iterative imputation method described by Glen and Kramer (1958). Any imputation method can only deal successfully with data that have only a relatively small percentage of missing values, and preferably with only one missing value. The power of the F test is reduced because there are fewer actual data points than for a complete design, and the degrees of freedom must be adjusted accordingly, which tends to decrease the F statistic.

• Multiple comparisons:
• In general, the multiple comparisons tests will be robust in those situations when the one-way blocked ANOVA's F test is robust, and will be subject to he same potential problems with unequal variances. Unequal variances may make individual comparisons of means inaccurate, because the multiple comparison techniques rely on a pooled estimate for the variance, based on the assumption that the sample variances are equal.