Logistic regression is used to fit a model to binary response (Y) data,
such as whether a subject dies (event) or lives (non-event).
These events are often described as success vs failure.
For each possible set of values for the independent (X) variables,
there is a probability p that a success occurs.
The linear logistic model fitted by maximum likelihood is:
Y = b0 + b1*X1 + b2*X2 + ... + bk*Xk
where Y is the logit transformation of p:
Y = log(p/(1-p));
i.e., Y is the log odds corresponding to p.
If ni observations are made at the ith set of
values of the X variables, then the count si of
successes can be used to calculate Y by using the proportion
of successes (s/n) in place of p.
Assumptions:
The linear functionYi = b0 + b1*X1i + b2*X2i + ... + bk*Xki + ei
is the correct model,
where Yi is the ith observed value of Y,
Xji
is the ith observed
value of the jth X variable, and ei is the
error term. Equivalently, the expected value
of Y for a given set of values for the X variables is
b0 + b1*X1 + b2*X2 + ... + bk*Xk.
The intercept is b0, the expected value of Y when
the value for each X variable is 0.
The logit transform of p to Y is the correct transformation
to achieve the linear function for Y.
(With the previous assumption, this amounts to assuming that the
linear logistic model is the correct model.)
The Xj variable (predictor variable) values are fixed
(i.e., none of the Xj is a random variable).
The response of each subject (success of failure)
follows a Bernoulli
distribution, independent of the other responses.
This means that the si are
distributed
as binomial random variables
with mean ni*pi.
For a given set of X variable values, the variable s
and thus the variable Y has constant mean.
The ei all have mean 0.
(However, residuals analysis is done comparing observed and
fitted values of s, not Y, so the
ei are of little interest.)
Many of the hypothesis tests rely on large sample sizes, for which
the maximum likelihood estimators will be approximately
normally distributed,
and will have at worst only small biases.
The X variables are also known as the independent variables.
The fitted Y variable is also known as the linear predictor.
If a discrete qualitative variable is included as a predictor,
it is encoded by dummy X variables.
A discrete variable with n different values will be encoded by
n-1 dummy X variables.
The coefficients are bj, the amount by which the expected
value of Y (the log odds) increases when Xj increases by a unit amount,
when all the other X variables are held constant.
This means that the estimated odds p/(1-p) are multiplied by exp(bj)
when Xj increases by a unit amount.
This interpretation of the coefficients does not hold
if some of the X variables are functions of the others,
such as an interaction term Xj*Xk.
Note that it is not assumed that the X variables are
independent of each other.
The fitted value of p can be calculated from the linear predictor
Y by using the formula
p = eY/(1 + eY).
Notation: Some references use the term Y to refer to the counts s,
and use other notation for the linear predictor.
Guidance:
Ways to detect before performing the
logistic regression whether your data violate any
assumptions.
Ways to examine logistic regression results to detect
assumption violations.
Possible alternatives if your data or
logistic regression results indicate assumption violations.
To properly analyze and interpret
results of logistic regression, you should be familiar with the following terms and
concepts:
If you are not familiar with these terms and concepts, you are advised to
consult with a statistician. Failure to understand and properly apply
logistic regression may result in drawing erroneous conclusions from your data.
Additionally, you may want to consult the following references:
Agresti, A. 1990. Categorical Data Analysis.
New York: John Wiley & Sons.
Agresti, A. 1996.
An Introduction to Categorical Data Analysis. New York: John Wiley & Sons.
Aldrich, J.H. and Nelson, F.D. 1984.
Linear Probability, Logit, and Probit Models.
Newbury Park, California: Sage Publications.
Collett, D. 1991. Modelling Binary Data.
London: Chapman and Hall.
Cox, D.R. and Snell, E.J. 1989. The Analysis
of Binary Data. 2nd ed. New York: John Wiley & Sons.
Demaris, A. 1992.
Logit Modeling.
Newbury Park, California: Sage Publications.
Everitt, B. S. 1992. The Analysis of Contingency Tables. 2nd ed.
London: Chapman & Hall.
Hosmer, D.W. and Lemeshow, S. 1989. Applied
Logistic Regression. New York: John Wiley & Sons.
McCullagh, P and Nelder, J.A. 1989.
Generalized Linear Models. 2nd ed. London: Chapman and Hall.
Menard, S. 1995.
Applied Logistic Regression Analysis.
Newbury Park, California: Sage Publications.
Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. 1996.
Applied Linear Regression Models. 3rd ed. Chicago: Irwin.
If you are unsatisfied with your purchase, you may return it within 30
days for an
exchange, credit or refund.
This guarantee does not cover electronic download products, special requests requiring photocopying
or
engineering aids; however, if you cannot
edit our document(s) in your MS Word, Excel or Visio program we will fix
it or give you a refund.
Can't find what you're
looking for...?
Please call, Fax or Email Us at:
Office: (719) 649-4242
Fax: (719) 573-4205 Home Page
Click here to bookmark At-PQC™ then visit our
Toolbox to find a quality control plan that will
help you achieve an effective and efficient business
infrastructure that focuses on customer satisfaction,
continuous improvement and desirable cost savings. Visit
with us today for comprehensive assistance in developing
or choosing the right quality control plan for your
business.
Click here to visit our extensive selection of
quality control plans, policies, procedures and forms or
click here
for help with where-to-start.
We can interact with you anywhere in the USA from
8:00am to 5:00pm Monday through Friday except holidays.
At-PQC™
JnF Specialties, LLC
664 Greenscape Lane
Colorado Springs, Colorado 80916-5534
Office:
(719) 649-4242
Fax: (719) 573-4205
Email Us at:
Send an email to request next-day support or call our helpline at 719-649-4242
during your office hours
Mon - Fri except holidays.