Possible alternatives if your data violate regression assumptions

Home | StatGuide | Glossary


If the simple linear model is incorrect, if the Y values do not have constant variance, if the data for the Y variable for the regression come from a population whose distribution violates the assumption of normality, or outliers are present, then the simple linear regression on the original data may provide misleading results, or may not be the best test available. In such cases, fitting a different linear model or a nonlinear model, performing a weighted least squares linear regression, transforming the X or Y data or using a alternative straight-line regression method may provide a better analysis.

Alternative procedures include:


  • Different linear model:
  • Y may actually be best modelled by a linear function that includes other variables besides X, If a graph of the residuals against the prospective X variable suggests a linear trend, then multiple linear regression may provide a better model. If there is a blocking variable such that there is potentially a different linear regression within each block, then an analysis of covariance may be a better model. In situations where there are multiple Y values measured at each X value, this situation of implicit blocking can be dealt with by using the average of the different Y responses at each X and fitting the regression to this reduced data set. A possible drawback to this method is that by reducing the number of data points, the degrees of freedom associated with the residual error is reduced, thus potentially reducing the power of the test.
  • Nonlinear model:
  • If Y is actually best modelled by a nonlinear function of X, especially if a nonlinear model is suggested on theoretical grounds, then nonlinear regression can be used to provide the best fit to the X-Y data. The shape of the X-Y plot may suggest an appropriate function to use, such as a polynomial in X. or an exponential model.
    Transformations can also be used to deal with nonlinearity, but involve changing the metric (and possible normality) for either X and Y. However, a nonlinear model usually is more complex (more parameters) than a transformed linear model. If there are many parameters to fit and not very many data points, the precision of the fitted parameters for a more complex model may not be very good.
  • Transformations:
  • Transformations (a single function applied to each X or each Y data value) are applied to correct problems of nonnormality or unequal variances. For example, taking logarithms of sample values can reduce skewness to the right. Transforming the Y values to remedy nonnormality often results in correcting heteroscedasticity (unequal variances). Occasionally, both the X and Y variables are transformed. Unless scientific theory suggests a specific transformation a priori, transformations are usually chosen from the "power family" of transformations, where each value is replaced by x**p, where p is an integer or half-integer, usually one of:
  • -2 (reciprocal square)
  • -1 (reciprocal)
  • -0.5 (reciprocal square root)
  • 0 (log transformation)
  • 0.5 (square root)
  • 1 (leaving the data untransformed)
  • 2 (square)

For p = -0.5 (reciprocal square root), 0, or 0.5 (square root), the data values must all be positive. To use these transformations when there are negative and positive values, a constant can be added to all the data values such that the smallest is greater than 0 (say, such that the smallest value is 1). (If all the data values are negative, the data can instead be multiplied by -1, but note that in this situation, data suggesting skewness to the right would now become data suggesting skewness to the left.) To preserve the order of the original data in the transformed data, if the value of p is negative, the transformed data are multiplied by -1.0; e.g., for p = -1, the data are transformed as x --> -1.0/x. Taking logs or square roots tends to "pull in" values greater than 1 relative to values less than 1, which is useful in correcting skewness to the right.

Another common transformation is the antilogarithm (exp(x)), which has effects similar to but more extreme than squaring: "drawing out" values greater than 1 relative to values less than 1.

Generally speaking, transformations of X are used to correct for non-linearity, and transformations of Y to correct for nonconstant variance of Y or nonnormality of the error terms. A transformation of Y to correct nonconstant variance or nonnormality of the error terms may also increase linearity. Transforming Y may change the error distribution from normal to nonnormal if the error distribution was normal to begin with.

A transformation of Y involves changing the metric in which the fitted values are analyzed, which may make interpretation of the results difficult if the transformation is complicated. If you are unfamiliar with transformations, you may wish to consult a statistician before proceeding.

The graph of the X-Y data may suggest an appropriate transformation of X if the plot shows nonlinearity but constant error variance (that is, the general shape of the plot is not linear, but the vertical deviation in the data values appears constant over the range of X values).

If the X-Y plot suggests an arc from lower left to upper right so that data points either very low or very high in X lie below the straight line suggested by the data, while the data points with middling X values lie on or above that straight line, taking square roots or logarithms of the X values may promote linearity.

If the X-Y plot suggests an arc from upper left to lower right so that data points either very low or very high in X lie above the straight line suggested by the data, while the data points with middling X values lie on or below that straight line, taking reciprocals or reciprocals of the antilogarithms of the X values may promote linearity:

If the X-Y plot suggests an arc from lower left to upper right so that data points either very low or very high in X lie above the straight line suggested by the data, while the data points with middling X values lie on or below that straight line, taking squares or antilogarithms of the X values may promote linearity:

If the X-Y plot suggests an arc from upper left to lower right so that data points either very low or very high in X lie below the straight line suggested by the data, while the data points with middling X values lie on or above that straight line, taking squares or antilogarithms of the X values may promote linearity:

The choice of a transformation of Y may be suggested by examining the plot of residuals against X or fitted values, If this appears linear, but the variance of the residuals increases as X increases, suggesting a wedge or megaphone shape, then taking square roots, logarithms, or reciprocals of the Y values may promote homogeneity of variance:

If the plot of residuals against X or fitted values is a convex arc from lower left to upper right, and the variance of the residuals increases as X increases, then taking square roots of the Y values may promote homogeneity of variance:

If the plot of residuals against X or fitted values is a concave arc from upper left to lower right, and the variance of the residuals decreases as X increases, then taking logarithms of the Y values may promote homogeneity of variance:

When a transformation of Y is indicated, a simultaneous transformation of X may also improve linearity of the fit with the transformed Y.

  • Weighted least squares linear regression:
  • If the plot of the residuals against X suggests heteroscedasticity (a wedge or megaphone shape instead of a featureless cloud of points), then a weighted linear regression may provide more precise estimates for the slope and intercept. The weights should be chosen to be proportional to the reciprocal of the variance at X. For example, if the variance at X is approximately proportional to the mean of X, then weights inversely proportional to the fitted Y would be appropriate--these weights could be calculated by fitting an unweighted least squares linear regression, then using the reciprocals of the fitted values from the unweighted least squares linear regression as the weights for a weighted least squares linear regression. Alternatively, the weights could be chosen empirically as the reciprocals of the original Y values. Although weighted least squares linear regression may deal with unconstant variance in Y, it is sensitive to outliers just as unweighted least squares linear regression is.
  • Alternative straight-line regression methods:
  • Nonparametric tests are tests that do not make the usual distributional assumptions of the normal-theory-based tests. For simple linear regression, nonparametric fitting methods include repeated-median regression, and the resistant line. Another alternative method is to calculate the fit so as to minimize the sum of the absolute values of the residuals (instead of minimizing the sum of their squared values). Most alternative methods involve iteration to converge to the final fit, which can make them computationally intensive. And although alternative methods may be more robust or resistant than the least squares fit to departures from normality or to outliers, they are not necessarily immune. Unless it involves some form of weighting or trimming values, an alternative linear regression method will not address the problem of inequality of variances. Any alternative method for linear regression will assume that the Y observations are mutually independent, that the residuals have the same variance and are centered about 0, and that the linear model is in fact the correct one. If the Y values do indeed come from populations with normal distributions, with the Y variable having constant variance, and the linear model is correct, then the least squares estimate of the slope is unbiased and has the smallest variance among all unbiased estimates of the slope.
  • Removing outliers:
  • A common method of dealing with apparent outliers in a regression situation is to remove the outliers and then refit the regression line to the remaining points. If the regression line is not substantially changed by the removal, then the fit to the remaining points will be improved without misrepresenting the data. However, if the outliers are due to a nonnormal distribution for the Y sample population, or to the underlying model being nonlinear, more can be learned by fitting a better model to the entire data (as by a nonlinear model or an alternative method of fitting the straight line) than by ignoring valid data values. And while removing a point that has a large residual may lead to a smaller residual variance for the new fitted line, it will not necessarily lead to a greater correlation or R-square for the new line, or to a smaller P value for the F test of overall fit.

Glossary | StatGuide Home | Home