(Simple) Linear Regression

Home | StatGuide | Glossary


Simple linear regression fits a straight line to X-Y data by the method of least squares.
The fit may then be used to test the null hypothesis that the slope is 0.


Assumptions:

  • The simple linear function Yi = b0 + b1*Xi + ei is the correct model, where Yi is the ith observed value of Y, Xi is the ith observed value of X, and ei is the error term. Equivalently, the expected value of Y for a given value of X is b0 + b1*X. The intercept is b0, the expected value of Y when X is 0. The slope is b1, the amount by which the expected value of Y increases when X increases by a unit amount.
  • The X variable (predictor variable) values are fixed (i.e., X is not a random variable).
  • The ei are independent, and identically normally distributed with mean 0 and the same variance.
  • The Y variable (response variable) observations are independent.
  • The variable Y is normally distributed with the same variance as the ei. For a given value of X, the variable Y has constant mean.

The normality assumption is required for hypothesis tests, but not for estimation.
The X variable is also known as the independent variable.
The Y variable is also known as the dependent variable.


Guidance:

  • Ways to detect before performing the linear regression whether your data violate any assumptions.
  • Ways to examine linear regression results to detect assumption violations.
  • Possible alternatives if your data or linear regression results indicate assumption violations.

To properly analyze and interpret the results of simple linear regression, you should be familiar with the following terms and concepts:

Failure to understand and properly apply simple linear regression may result in drawing erroneous conclusions from your data. If you are not familiar with these terms and concepts, you may wish to consult with a statistician. You may also want to consult the following references:

  • Brownlee, K. A. 1965. Statistical Theory and Methodology in Science and Engineering. New York: John Wiley & Sons.
  • Daniel, Wayne W. 1995. Biostatistics. 6th ed. New York: John Wiley & Sons.
  • Draper, N. R. and Smith, H. 1981. Applied Regression Analysis. 2nd ed. New York: John Wiley & Sons.
  • Hoaglin, D. C., Mosteller, F., and Tukey, J. W. 1985. Exploring Data Tables, Trends, and Shapes. New York: John Wiley & Sons.
  • Miller, Rupert G. Jr. 1996. Beyond ANOVA, Basics of Applied Statistics. 2nd. ed. London: Chapman & Hall.
  • Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. 1996. Applied Linear Regression Models. 3rd ed. Chicago: Irwin.
  • Neter, J., Wasserman, W., and Kutner, M.H. 1990. Applied Linear Statistical Models. 3rd ed. Homewood, IL: Irwin.
  • Rosner, Bernard. 1995. Fundamentals of Biostatistics. 4th ed. Belmont, California: Duxbury Press.
  • Sokal, Robert R. and Rohlf, F. James. 1995. Biometry. 3rd. ed. New York: W. H. Freeman and Co.
  • Zar, Jerrold H. 1996. Biostatistical Analysis. 3rd ed. Upper Saddle River, NJ: Prentice-Hall.

Glossary | StatGuide Home | Home