If the populations
from which data to be analyzed by a Mann-Whitney rank sum test were sampled
violate one or more of the rank sum test assumptions, the results of the
analysis may be incorrect or misleading. For example, if the assumption of independence
is violated, then the Mann-Whitney rank sum test is simply not appropriate,
although another test (perhaps the Wilcoxon
paired signed rank) may be appropriate.
Often, the effect of an assumption violation on the rank sum test result
depends on the extent of the violation (such as the how unequal the population
disperions are, or how skewed
one or the other population distribution
is). Some small violations may have little practical effect on the analysis,
while other violations may render the rank sum test result uselessly incorrect
or uninterpretable. In particular, small
sample sizes can increase vulnerability to assumption violations.
A lack of independence
within a sample is often caused by the existence of an implicit factor in the
data. For example, values collected over time may be serially correlated
(here time is the implicit factor). If the data are in a particular order,
consider the possibility of dependence. (If the row order of the data reflect
the order in which the data were collected, an index
plot of the data [data value plotted against row number] can reveal
patterns in the plot that could suggest possible time effects.)
Whether the two samples are independent
of each other is generally determined by the structure of the experiment from
which they arise. Obviously correlated samples, such as a set of pre- and
post-test observations on the same subjects, are not independent, and such
data would be more appropriately tested by a two-sample paired test. If you
are unsure whether your samples are independent, you may wish to consult a
statistician or someone who is knowledgeable about the data collection scheme
you are using.
Values may not be identically distributed because of the presence of outliers.
Outliers are anomalous values in the data. They may be due to recording
errors, which may be correctable, or they may be due to the sample not being
entirely from the same population. Apparent outliers may also be due to the
values being from the same, but skewed
or heavy-tailed
population.
Outliers tend to increase the estimate of sample variation, and might lead
to an incorrect conclusion that the dispersions of the two samples are not
equal if the outlier is the result of a recording or measurement error.
Because the statistic for the rank sum test is resistant,
it will not be substantially affected by the presence of outliers unless the
number of outliers becomes large relative to the sample size.
The inequality of the population dispersions can be assessed by
examination of the relative size of the sample variations, either informally
(including graphically),
or by a variance
test such as the Ansari-Bradley
test.
If both outliers and unequal dispersions are present, employing a transformation
may resolve both problems at once, and also promote normality. In this case,
it may be preferable to perform an unpaired
two-sample t test on the transformed data, as the t test has slightly more
power
than the rank sum test if the assumption of normality holds. (The rank sum
test has about 95% efficiency compared to the unpaired t test if the
assumption is in fact correct.)
The usual measurement for sample variance is not resistant
to outliers, while the Ansari-Bradley test is less subject to influence by
outliers. For this reason, the Ansari-Bradley test may not reject equality of
dispersions even when the sample variances seem to be substantial different. A
lack of power due to small sample sizes may also lead to this situation.
If the assumptions for the samples' population distributions
are correct, the skewnesses
of the two samples should be comparable, and if either sample suggests heavy
tails or light
tails (or neither) the other sample should suggest the same.
Differences in distributional shapes
can be assessed by examination of the data, as with boxplots,
histograms,
and normal
probability plots. Differing results for each sample for the normality
test also suggest the possibility of differing distributional shapes.
If the assumptions for the samples' population distributions
are correct, the plot of either sample's values against its mean or median (or
its sample ID) should suggest a horizontal band across the graph. Because
there are only two unique sample means/medians or sample ID values, this type
of graph will consist of two vertical "stacks" of data points; the stacks
should be about the same length. Outliers
may appear as anomalous points in the graph.
A
fan pattern like the profile of a megaphone, with a noticeable flare either to
the right or to the left as shown in the picture (one of the "stacks" of data
points is much longer than the other), suggests that the variation in the
values increases in the direction the fan pattern widens (usually as the
sample mean increases), and this in turn suggests that a transformation
may be needed.
Side-by-side boxplots
of the two samples can also reveal lack of homogeneity
of dispersion if one boxplot is much longer than the other, and reveal
suspected outliers.
If one or both of the sample sizes is small, it may be difficult to detect
assumption violations. With small samples, violation assumptions such as inequality
of dispersions are difficult to detect even when they are present. Also,
with small sample size(s) there is less resistance to outliers, and less
protection against violation of assumptions.
Even if none of the test assumptions are violated, a rank sum test with
small sample sizes may not have sufficient power
to detect a significant difference between the two samples, even if the
medians are in fact different. Power decreases as the significance level is
decreased (i.e., as the test is made more stringent), and increases as the
sample size increases. With very small samples, even samples from populations
with very different medians may not produce a significant rank sum test
statistic. If a statistical significance test with small sample sizes produces
a surprisingly non-significant P
value, then a lack of power may be the reason. The best time to avoid such
problems is in the design stage of an experiment, when appropriate minimum
sample sizes can be determined, perhaps in consultation with a statistician,
before data collection begins.
If you are unsatisfied with your purchase, you may return it within 30
days for an
exchange, credit or refund.
This guarantee does not cover electronic download products, special requests requiring photocopying
or
engineering aids; however, if you cannot
edit our document(s) in your MS Word, Excel or Visio program we will fix
it or give you a refund.
Can't find what you're
looking for...?
Please call, Fax or Email Us at:
Office: (719) 649-4242
Fax: (719) 573-4205 Home Page
Click here to bookmark At-PQC™ then visit our
Toolbox to find a quality control plan that will
help you achieve an effective and efficient business
infrastructure that focuses on customer satisfaction,
continuous improvement and desirable cost savings. Visit
with us today for comprehensive assistance in developing
or choosing the right quality control plan for your
business.
Click here to visit our extensive selection of
quality control plans, policies, procedures and forms or
click here
for help with where-to-start.
We can interact with you anywhere in the USA from
8:00am to 5:00pm Monday through Friday except holidays.
At-PQC™
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916-5534
Office:
(719) 649-4242
Fax: (719) 573-4205
Email Us at:
Send an email to request next-day support or call our helpline at 719-649-4242
during your office hours
Mon - Fri except holidays.