# Does your data violate Wilcoxon paired signed rank test assumptions?

If the population from which paired differences to be analyzed by a Wilcoxon signed rank test were sampled violate one or more of the signed rank test assumptions, the results of the analysis may be incorrect or misleading. For example, if the assumption of independence for the paired differences is violated, then the Wilcoxon signed rank test is simply not appropriate.

Note that the two values that make up each paired difference need not be independent, and in fact are expected to be correlated, such as before and after measurements. If you treat paired data as coming from two independent samples, such as doing an inappropriate Mann-Whitney rank-sum test instead of a paired signed rank test, then you may sacrifice power.

If outliers are present, or if the data in fact come from a normal distribution, then the signed rank test may not be the most powerful test available, and this could mean the difference between detecting a true difference or not. Another nonparametric test, the paired two-sample t test, or employing a transformation may result in a more powerful test. If the distribution of the paired differences is not symmetric, a transformation may produce symmetry.

Often, the effect of an assumption violation on the signed rank test result depends on the extent of the violation (such as how skewed the distribution of the paired differences is). Some small violations may have little practical effect on the analysis, while other violations may render the signed rank test result uselessly incorrect or uninterpretable. In particular, small sample sizes can increase vulnerability to assumption violations.

#### Potential assumption violations include:

• Implicit factors:
• A lack of independence within a sample is often caused by the existence of an implicit factor in the data. For example, values collected over time may be serially correlated (here time is the implicit factor). If the data are in a particular order, consider the possibility of dependence. (If the row order of the data reflect the order in which the data were collected, an index plot of the data [data value plotted against row number] can reveal patterns in the plot that could suggest possible time effects.)
• Outliers:
• Values may not be identically distributed because of the presence of outliers. Outliers are anomalous values in the data. They may be due to recording errors, which may be correctable, or they may be due to the sample not being entirely from the same population. AAparent outliers may also be due to the values being from the same, but skewed or heavy-tailed population. Outliers may lead to an incorrect conclusion that the distribution of the paired differences is skewed. Because the statistic for the signed rank test is resistant, it will not be substantially affected by the presence of outliers unless the number of outliers becomes large relative to the sample size. The signed rank test generally does well with paired differences with outliers, or when the paired differences come from heavy-tailed (but symmetric) distributions. The boxplot and normal probability plot (normal Q-Q plot) may suggest the presence of outliers in the data. If you find outliers in your data that are not due to correctable errors, you may wish to consult a statistician as to how to proceed.
• Skewness:
• If the population from which the paired differences were sampled is skewed, then the signed rank test may incorrectly reject the null hypothesis that the median of the paired differences is 0 even when it is true. The paired sign test does not rely on symmetry, and may be an appropriate alternative test. Paired differences are often symmetric even when the two populations producing the values that make up the paired differences are both unsymmetric, provided that those two populations have similar skewness. For example, two very positively skewed distributions that differ only by location will produce a set of paired differences that are symmetric about 0, and perfectly suitable for the signed rank test. This is often the case with before and after measurements. Whether or not the population of the paired differences is skewed can be assessed either informally (including graphically), or by examining the sample skewness statistic or conducting a test for skewness. If outliers or skewness is present, employing a transformation may resolve both problems at once, and also promote normality. In this case, it may be preferable to perform an paired two-sample t test on the transformed data, as the t test has slightly more power than the signed rank test if the assumption of normality holds. (The signed rank test has about 95% efficiency compared to the paired t test if the assumption of normality is in fact correct.) The usual measurement for skewness is not resistant to outliers, so one should be consider the possibility that apparent skewness is in fact due to one or more outliers. A lack of power due to small sample sizes may also make it hard to detect skewness.
• Patterns in plot of data:
• Outliers may appear as anomalous points in a graph of the paired differences against their median. A boxplot or normal probability plot of the paired differences can also reveal lack of symmetry and suspected outliers.
• Special problems with small sample sizes:
• If the number of non-zero paired differences is small, it may be difficult to detect assumption violations. With small samples, violation assumptions uch as skewness are difficult to detect even when they are present. Also, with small sample size(s) there is less resistance to outliers, and less protection against violation of assumptions. Even if none of the test assumptions are violated, a signed rank test with small sample sizes may not have sufficient power to detect a significant difference between the median of the paired differences and 0, even if the medians are in fact different. Power decreases as the significance level is decreased (i.e., as the test is made more stringent), and increases as the sample size increases. With very small samples, even samples from populations with very different means may not produce a significant signed rank test statistic. If a statistical significance test with small sample sizes produces a surprisingly non-significant P value, then a lack of power may be the reason. The best time to avoid such problems is in the design stage of an experiment, when appropriate minimum sample sizes can be determined, perhaps in consultation with a statistician, before data collection begins.
• Special problems with zeroes:
• Because paired differences equal to 0 are ignored (omitted from the analysis), having a relatively large number of paired differences equal to 0 can drastically reduce the effective sample size.
• Many tied values:
• If there are many tied values in the data, the assumption of continuity for the distribution of the paired differences may be called into question. A correction for tied values is made in performing the signed rank test; however, the number of ties must be quite large relative to the total sample size before the correction makes a substantial difference in the test results. The effect of ties depends not only on the number of ties, but how many observed zaired differences are tied at a single value. A cluster of paired differences tied at the same value will lead to a bigger correction than the same number of ties scattered a different values. Such a situation also raises questions about the assumption of independence for the paired differences as well as whether they come from a continuous distribution.