If the data for one or both of the samples to be analyzed by a rank sum test come from a population whose distribution violates the assumption of same distributional shape, or if outliers are present, then the rank sum test on the original data may provide misleading results, or may not be the most powerful test available. Transforming the data to promote normality and then performing a two-sample t test, or using another nonparametric test may provide a better analysis.
- Transformations (a single function applied to each data value) can be applied to correct problems of unequal dispersions. Transforming the two samples to remedy nonnormality often results in correcting heteroscedasticity (unequal dispersions). If such a transformation can be found, the transformed data may be suitable for use with a two-sample t test. The resulting test may be more powerful than the original rank sum test. The same transformation should be applied to both samples. Unless scientific theory suggests a specific transformation a priori, transformations are usually chosen from the "power family" of transformations, where each value is replaced by x**p, where p is an integer or half-integer, usually one of:
- -2 (reciprocal square)
- -1 (reciprocal)
- -0.5 (reciprocal square root)
- 0 (log transformation)
- 0.5 (square root)
- 1 (leaving the data untransformed)
- 2 (square)
p = -0.5 (reciprocal square root),0, or 0.5 (square root), the data values must all be positive. To use these transformations when there are negative and positive values, a constant can be added to all the data values such that the smallest is greater than 0 (say, such that the smallest value is 1). (If all the data values are negative, the data can instead be multiplied by -1, but note that in this situation, data suggesting skewness to the right would now become data suggesting skewness to the left.) To preserve the order of the original data in the transformed data, if the value of p is negative, the transformed data are multiplied by -1.0; e.g., for p = -1, the data are transformed as x --> -1.0/x. Taking logs or square roots tends to "pull in" values greater than 1 relative to values less than 1, which is useful in correcting skewness to the right. Transformation involves changing the metric in which the data are analyzed, which may make interpretation of the results difficult if the transformation is complicated. If you are unfamiliar with transformations, you may wish to consult a statistician before proceeding.
- Other nonparametric tests:
- Although the rank sum test is the most commonly used nonparametric alternative to the unpaired two-sample t test, it is not the only one. However, all tests assume that the two samples are independent of each other, and that there is independence within each sample. A median test can be calculated by creating a 2x2 contingency table of counts of the values in each sample that are greater or not greater than the median of both samples together. Then this contingency table can be tested by a chi-square test or Fisher's exact test. This test does not assume equality of dispersions, but is likely to be less powerful than the rank sum test when the dispersions are in fact comparable.
- Unpaired two-sample t test
- If the sampled values do indeed come from populations with normal distributions, then the unpaired two-sample t test is the most powerful test of the equality of the two means, meaning that no other test is more likely to detect an actual difference between the two means. (If a distribution is symmetric, its mean and median are both equal to the center of symmetry. Since the normal distribution is symmetric, the t test can also be viewed as testing whether the difference between the two sample medians is 0, if the normality assumption holds.) If the population distributions are not normal, however, the rank sum test may be more powerful at detecting differences between the sample medians. If applying a transformation promotes normality, the unpaired two-sample t test may be a more powerful test than the rank sum test for the transformed data.