Tutor HuntResources Maths Resources

Logistic Regression

Logistic regression By Sourav Da

Date : 29/08/2012

Author Information

Sourav

Uploaded by : Sourav
Uploaded on : 29/08/2012
Subject : Maths

Logistic regression By Sourav Das

Logistic regression is part of a category of statistical models called generalized linear models. This broad class of models includes ordinary regression and ANOVA, as well as multivariate statistics such as ANCOVA and loglinear regression. An excellent treatment of generalized linear models is presented in Agresti (1996).

Logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. Generally, the dependent or response variable is dichotomous, such as presence/absence or success/failure. Discriminant analysis is also used to predict group membership with only two groups. However, discriminant analysis can only be used with continuous independent variables. Thus, in instances where the independent variables are a categorical, or a mix of continuous and categorical, logistic regression is preferred. The Model:

The dependent variable in logistic regression is usually dichotomous, that is, the dependent variable can take the value 1 with a probability of success , or the value 0 with probability of failure 1-. This type of variable is called a Bernoulli (or binary) variable. Although not as common and not discussed in this treatment, applications of logistic regression have also been extended to cases where the dependent variable is of more than two cases, known as multinomial or polytomous [Tabachnick and Fidell (1996) use the term polychotomous].

As mentioned previously, the independent or predictor variables in logistic regression can take any form. That is, logistic regression makes no assumption about the distribution of the independent variables. They do not have to be normally distributed, linearly related or of equal variance within each group.The relationship between the predictor and response variables is not a linear function in logistic regression, instead, the logistic regression function is used, which is the logit transformation of :

Where = the constant of the equation and, = the coefficient of the predictor variables. The goal of logistic regression is to correctly predict the category of outcome for individual cases using the most parsimonious model. To accomplish this goal, a model is created that includes all predictor variables that are useful in predicting the response variable. Several different options are available during model creation. Variables can be entered into the model in the order specified by the researcher or logistic regression can test the fit of the model after each coefficient is added or deleted, called stepwise regression.

Stepwise regression is used in the exploratory phase of research but it is not recommended for theory testing (Menard 1995). Theory testing is the testing of a-priori theories or hypotheses of the relationships between variables. Exploratory testing makes no a-priori assumptions regarding the relationships between the variables, thus the goal is to discover relationships.

Backward stepwise regression appears to be the preferred method of exploratory analyses, where the analysis begins with a full or saturated model and variables are eliminated from the model in an iterative process. The fit of the model is tested after the elimination of each variable to ensure that the model still adequately fits the data.When no more variables can be eliminated from the model, the analysis has been completed.

There are two main uses of logistic regression. The first is the prediction of group membership. Since logistic regression calculates the probability or success over the probability of failure, the results of the analysis are in the form of an odds ratio. For example, logistic regression is often used in epidemiological studies where the result of the analysis is the probability of developing cancer after controlling for other associated risks. Logistic regression also provides knowledge of the relationships and strengths among the variables (e.g., smoking 10 packs a day puts you at a higher risk for developing cancer than working in an asbestos mine).

This resource was uploaded by: Sourav

Other articles by this author