Multiple Linear Regression

Home | StatGuide | Glossary


Multiple linear regression fits a response variable as a linear combination of multiple X variables by the method of least squares.


Assumptions:

  • The linear function Yi = b0 + b1*X1i + b2*X2i + ... + bk*Xki + ei is the correct model, where Yi is the ith observed value of Y, Xji is the ith observed value of the jthX variable, and ei is the error term. Equivalently, the expected value of Y for a given value of X is Y = b0 + b1*X1 + b2*X2 + ... + bk*Xk. The intercept is b0, the expected value of Y when the value for each X variable is 0.
  • The Xj variable (predictor variable) values are fixed (i.e., none of the Xj is a random variable).
  • The ei are independent, and identically normally distributed with mean 0 and the same variance.
  • The Y variable (response variable) observations are independent.
  • The variable Y is normally distributed with the same variance as the ei. For a given set of X variable values, the variable Y has constant mean.

The normality assumption is required for hypothesis tests, but not for estimation.
The X variables are also known as the independent variables.
The Y variable is also known as the dependent variable.

The coefficients are bj, the amount by which the expected value of Y increases when Xj increases by a unit amount, when all the other X variables are held constant. This interpretation of the coefficients does not hold if some of the X variables are functions of the others, such as an interaction term Xj*Xk.

Note that it is not assumed that the X variables are independent of each other.


Guidance:

  • Ways to detect before performing the multiple linear regression whether your data violate any assumptions.
  • Ways to examine multiple linear regression results to detect assumption violations.
  • Possible alternatives if your data or multiple linear regression results indicate assumption violations.

To properly analyze and interpret the results of multiple linear regression, you should be familiar with the following terms and concepts:

If you are not familiar with these terms and concepts, you are advised to consult with a statistician. Failure to understand and properly apply multiple linear regression may result in drawing erroneous conclusions from your data. Additionally, you may want to consult the following references:

  • Belsley, David A., Kuh, Edwin, and Welsch, Roy E. 1980. Regression Diagnostics. New York: John Wiley & Sons.
  • Brownlee, K. A. 1965. Statistical Theory and Methodology in Science and Engineering. New York: John Wiley & Sons.
  • Daniel, Wayne W. 1995. Biostatistics. 6th ed. New York: John Wiley & Sons.
  • Draper, N. R. and Smith, H. 1981. Applied Regression Analysis. 2nd ed. New York: John Wiley & Sons.
  • Hoaglin, D. C., Mosteller, F., and Tukey, J. W. 1985. Exploring Data Tables, Trends, and Shapes. New York: John Wiley & Sons.
  • Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. 1996. Applied Linear Regression Models. 3rd ed. Chicago: Irwin.
  • Neter, J., Wasserman, W., and Kutner, M.H. 1990. Applied Linear Statistical Models. 3rd ed. Homewood, IL: Irwin.
  • Rosner, Bernard. 1995. Fundamentals of Biostatistics. 4th ed. Belmont, California: Duxbury Press.
  • Sokal, Robert R. and Rohlf, F. James. 1995. Biometry. 3rd. ed. New York: W. H. Freeman and Co.
  • Zar, Jerrold H. 1996. Biostatistical Analysis. 3rd ed. Upper Saddle River, NJ: Prentice-Hall.

Glossary | StatGuide Home | Home