Logistic regression is used to fit a model to binary response (Y) data, such as whether a subject dies (event) or lives (non-event). These events are often described as success vs failure. For each possible set of values for the independent (X) variables, there is a probability p that a success occurs.
The linear logistic model fitted by maximum likelihood is:

Y = b_{0} + b_{1}*X_{1} + b_{2}*X_{2} + ... + b_{k}*X_{k}

where Y is the logit transformation of p:

Y = log(p/(1-p));

i.e., Y is the log odds corresponding to p.

If n_{i} observations are made at the *ith* set of values of the X variables, then the count s_{i} of successes can be used to calculate Y by using the proportion of successes (s/n) in place of p.

### Assumptions:

- The linear function
Yis the correct model, where_{i}= b_{0}+ b_{1}*X_{1i}+ b_{2}*X_{2i}+ ... + b_{k}*X_{ki}+ e_{i}Yis the_{i}ithobserved value of Y,Xis the_{ji}ithobserved value of thejthX variable, andeis the error term. Equivalently, the expected value of Y for a given set of values for the X variables is_{i}b. The_{0}+ b_{1}*X_{1}+ b_{2}*X_{2}+ ... + b_{k}*X_{k}interceptisb, the expected value of Y when the value for each X variable is 0._{0}- The logit transform of p to Y is the correct transformation to achieve the linear function for Y. (With the previous assumption, this amounts to assuming that the linear logistic model is the correct model.)
- The X
_{j}variable (predictor variable) values are fixed (i.e., none of the X_{j}is a random variable).- The counts of successes s
_{i}are independent.- The response of each subject (success of failure) follows a Bernoulli distribution, independent of the other responses. This means that the s
_{i}are distributed as binomial random variables with mean n_{i}*p_{i}. For a given set of X variable values, the variable s and thus the variable Y has constant mean.- The
eall have mean 0. (However, residuals analysis is done comparing observed and fitted values of_{i}s, notY, so theeare of little interest.)_{i}

Many of the hypothesis tests rely on large sample sizes, for which the maximum likelihood estimators will be approximately normally distributed, and will have at worst only small biases.

The X variables are also known as the **independent** variables.

The fitted Y variable is also known as the **linear predictor**.

If a discrete qualitative variable is included as a predictor, it is encoded by dummy X variables. A discrete variable with **n** different values will be encoded by **n-1** dummy X variables.

The **coefficients** are **b _{j}**, the amount by which the expected value of Y (the log odds) increases when X

_{j}increases by a unit amount,

*when all the other X variables are held constant*. This means that the estimated odds p/(1-p) are multiplied by exp(b

_{j}) when X

_{j}increases by a unit amount. This interpretation of the coefficients does not hold if some of the X variables are functions of the others, such as an interaction term X

_{j}*X

_{k}.

Note that it is *not* assumed that the X variables are independent of each other.

The fitted value of p can be calculated from the linear predictor Y by using the formula

p = e^{Y}/(1 + e^{Y}).

Notation: Some references use the term Y to refer to the counts s, and use other notation for the linear predictor.

### Guidance:

Ways to detectbefore performing the logistic regression whether your data violate any assumptions.Ways to examinelogistic regression results to detect assumption violations.Possible alternativesif your data or logistic regression results indicate assumption violations.

To properly analyze and interpret results of *logistic regression*, you should be familiar with the following terms and concepts:

- binary variable
- linear logistic model
- logit transformation
- log odds
- binomial distribution
- dummy variables
- linear predictor
- method of maximum likelihood
- residuals
- sensitivity
- specificity
- false positive
- false negative
- receiver operator characteristic (ROC) curve
- transformations

If you are not familiar with these terms and concepts, you are advised to consult with a statistician. Failure to understand and properly apply *
logistic regression* may result in drawing erroneous conclusions from your data. Additionally, you may want to consult the following references:

- Agresti, A. 1990.
Categorical Data Analysis.New York: John Wiley & Sons.- Agresti, A. 1996.
An Introduction to Categorical Data Analysis.New York: John Wiley & Sons.- Aldrich, J.H. and Nelson, F.D. 1984.
Linear Probability, Logit, and Probit Models.Newbury Park, California: Sage Publications.- Collett, D. 1991.
Modelling Binary Data.London: Chapman and Hall.- Cox, D.R. and Snell, E.J. 1989.
The Analysis of Binary Data.2nd ed. New York: John Wiley & Sons.- Demaris, A. 1992.
Logit Modeling.Newbury Park, California: Sage Publications.- Everitt, B. S. 1992.
The Analysis of Contingency Tables.2nd ed. London: Chapman & Hall.- Hosmer, D.W. and Lemeshow, S. 1989.
Applied Logistic Regression.New York: John Wiley & Sons.- McCullagh, P and Nelder, J.A. 1989.
Generalized Linear Models.2nd ed. London: Chapman and Hall.- Menard, S. 1995.
Applied Logistic Regression Analysis.Newbury Park, California: Sage Publications.- Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. 1996.
Applied Linear Regression Models.3rd ed. Chicago: Irwin.

Glossary | StatGuide Home | Home