Transformations (a single function applied to each X or each Y data value)
are applied to correct problems of nonnormality
or unequal
variances. For example, taking logarithms of sample values can reduce skewness
to the right. Transforming the Y values to remedy nonnormality often results
in correcting heteroscedasticity (unequal variances). Occasionally, both the X
and Y variables are transformed.
Unless scientific theory suggests a specific transformation a
priori, transformations are usually chosen from the "power family" of
transformations, where each value is replaced by x**p, where p
is an integer or half-integer, usually one of:
- -2 (reciprocal square)
- -1 (reciprocal)
- -0.5 (reciprocal square root)
- 0 (log transformation)
- 0.5 (square root)
- 1 (leaving the data untransformed)
- 2 (square)
For p = -0.5 (reciprocal square root), 0, or 0.5 (square root), the data
values must all be positive. To use these transformations when there are
negative and positive values, a constant can be added to all the data values
such that the smallest is greater than 0 (say, such that the smallest value is
1). (If all the data values are negative, the data can instead be multiplied
by -1, but note that in this situation, data suggesting skewness
to the right would now become data suggesting skewness to the left.) To
preserve the order of the original data in the transformed data, if the value
of p is negative, the transformed data are multiplied by -1.0; e.g., for p =
-1, the data are transformed as x --> -1.0/x. Taking logs or square roots
tends to "pull in" values greater than 1 relative to values less than 1, which
is useful in correcting skewness to the right.
Another common transformation is the antilogarithm (exp(x)), which has
effects similar to but more extreme than squaring: "drawing out" values
greater than 1 relative to values less than 1.
Generally speaking, transformations of X are used to correct for
non-linearity, and transformations of Y to correct for nonconstant variance of
Y or nonnormality of the error terms. A transformation of Y to correct
nonconstant variance or nonnormality of the error terms may also increase
linearity. Transforming Y may change the error distribution from normal to
nonnormal if the error distribution was normal to begin with.
A transformation of Y involves changing the metric in which the fitted
values are analyzed, which may make interpretation of the results difficult if
the transformation is complicated. If you are unfamiliar with transformations,
you may wish to consult a statistician before proceeding.
The graph of the X-Y data may suggest an appropriate transformation of X if
the plot shows nonlinearity but constant error variance (that is, the general
shape of the plot is not linear, but the vertical deviation in the data values
appears constant over the range of X values).
If the X-Y plot suggests an arc from lower left to upper right so that data
points either very low or very high in X lie below the straight line suggested
by the data, while the data points with middling X values lie on or above that
straight line, taking square roots or logarithms of the X values may promote
linearity.
If the X-Y plot suggests an arc from upper left to lower right so that data
points either very low or very high in X lie above the straight line suggested
by the data, while the data points with middling X values lie on or below that
straight line, taking reciprocals or reciprocals of the antilogarithms of the
X values may promote linearity:
If the X-Y plot suggests an arc from lower left to upper right so that data
points either very low or very high in X lie above the straight line suggested
by the data, while the data points with middling X values lie on or below that
straight line, taking squares or antilogarithms of the X values may promote
linearity:
If the X-Y plot suggests an arc from upper left to lower right so that data
points either very low or very high in X lie below the straight line suggested
by the data, while the data points with middling X values lie on or above that
straight line, taking squares or antilogarithms of the X values may promote
linearity:
The choice of a transformation of Y may be suggested by examining the plot
of residuals against X or fitted values, If this appears linear, but the
variance of the residuals increases as X increases, suggesting a wedge or
megaphone shape, then taking square roots, logarithms, or reciprocals of the Y
values may promote homogeneity of variance:
If the plot of residuals against X or fitted values is a convex arc from
lower left to upper right, and the variance of the residuals increases as X
increases, then taking square roots of the Y values may promote homogeneity of
variance:
If the plot of residuals against X or fitted values is a concave arc from
upper left to lower right, and the variance of the residuals decreases as X
increases, then taking logarithms of the Y values may promote homogeneity of
variance:
When a transformation of Y is indicated, a simultaneous transformation of X
may also improve linearity of the fit with the transformed Y.