# Does your data violate paired sign test assumptions?

If the population from which paired differences to be analyzed by a sign test were sampled violate one or more of the sign test assumptions, the results of the analysis may be incorrect or misleading. For example, if the assumption of independence for the paired differences is violated, then the sign test is simply not appropriate.

Note that the two values that make up each paired difference need not be independent, and in fact are expected to be correlated, such as before and after measurements. If you treat paired data as coming from two independent samples, such as doing an inappropriate Mann-Whitney rank-sum test instead of a paired sign test, then you may sacrifice power.

Because it requires only the sign for each paired difference, the sign test is quite resistant to outliers. However, it is often not the most powerful test available, and this could mean the difference between detecting a true difference or not. This is particularly true if the underlying distribution for the paired differences is symmetric, or if the data in fact come from a normal distribution. Another nonparametric test, the paired two-sample t test, or employing a transformation may result in a more powerful test.

Often, the effect of an assumption violation on the sign test result depends on the extent of the violation. Some small violations may have little practical effect on the analysis, while other violations may render the sign test result uselessly incorrect or uninterpretable. In particular, small sample sizes can increase vulnerability to assumption violations.

#### Potential assumption violations include:

• Implicit factors:
• A lack of independence within a sample is often caused by the existence of an implicit factor in the data. For example, values collected over time may be serially correlated (here time is the implicit factor). If the data are in a particular order, consider the possibility of dependence. (If the row order of the data reflect the order in which the data were collected, an index plot of the data [data value plotted against row number] can reveal patterns in the plot that could suggest possible time effects.)
• Outliers:
• Values may not be identically distributed because of the presence of outliers. Outliers are anomalous values in the data. They may be due to recording errors, which may be correctable, or they may be due to the sample not being entirely from the same population. Apparent outliers may also be due to the values being from the same, but skewed or heavy-tailed population. Because the statistic for the sign test is resistant, it will not be substantially affected by the presence of outliers. However, you should remain alert to the possibility that outliers may represent recording errors in the data. The boxplot and normal probability plot (normal Q-Q plot) may suggest the presence of outliers in the data. If you find outliers in your data that are not due to correctable errors, you may wish to consult a statistician as to how to proceed.
• Patterns in plot of data:
• Outliers may appear as anomalous points in a graph of the paired differences against their median. A boxplot or normal probability plot of the paired differences can also reveal suspected outliers.
• Special problems with small sample sizes:
• If the number of non-zero paired differences is small, it may be difficult to detect assumption violations. With small sample size(s) there is less resistance to outliers, and less protection against violation of assumptions. Even if none of the test assumptions are violated, a sign test with small sample sizes may not have sufficient power to detect a significant difference between the median of the paired differences and 0, even if the medians are in fact different. Power decreases as the significance level is decreased (i.e., as the test is made more stringent), and increases as the sample size increases. With very small samples, even samples from populations with very different means may not produce a significant sign test statistic. If a statistical significance test with small sample sizes produces a surprisingly non-significant P value, then a lack of power may be the reason. The best time to avoid such problems is in the design stage of an experiment, when appropriate minimum sample sizes can be determined, perhaps in consultation with a statistician, before data collection begins.
• Special problems with zeroes:
• Because paired differences equal to 0 are ignored (omitted from the analysis), having a relatively large number of paired differences equal to 0 can drastically reduce the effective sample size.
• Many tied values:
• If there are many tied values in the data, the assumption of continuity for the distribution of the paired differences may be called into question. No correction for tied values is made in performing the sign test. Such a situation also raises questions about the assumption of independence for the paired differences as well as whether they come from a continuous distribution.