This assignment will be made available in both pdf and Microsoft docx format. Answers should be typed into the docx file, saved, and converted into pdf format for submission into Blackboard. Color your answers in green so that they can be easily distinguished from the questions themselves.
All of these computations are covered in examples in the assigned reading, and hopefully in the notes that you have been taking. If you need to refresh your memory, then begin by looking at Chapter 12 in Regression Analysis By Example and Chapter 1 in Applied Regression Analysis.
Throughout this assignment keep all decimals to four places, i.e. X.xxxx.
Any computations that involve “the log function”, denoted by log(x), are always meant to mean the natural log function (which will show as ln() on a calculator). The only time that you should ever use a log function other than the natural logarithm is if you are given a specific base.
When stating the null and alternate hypotheses in any statistical test in PREDICT 410, we should always state these hypotheses in terms of the model parameters, i.e. the model coefficients denoted by the betas.
Foundations of Logistic Regression:
Question: What values can the response variable Y take in logistic regression, and hence what statistical distribution does Y follow? The response variable can take the value of either a 1 or a 0, and follows a binomial distribution.
Question: How are the parameters estimated in logistic regression? Is this different from how the parameters are estimated in Ordinary Least Squares (OLS) regression? Logistic regression parameters are estimated utilizing the maximum likelihood method, which is the same underlying method for OLS regression. However, with logistic regression, an iterative method conducted via software because it is more complicated to estimate nonlinear parameters β0 and β1. This differs from OLS, because the OLS method is by differentiating the sum of squared deviations. This is an easier method because those deviations are linear in relation to β.
Coefficient estimates in logistic regression can also be found by utilizing the following methods
• Non-iterative weighted least squares
• Discriminant function analysis
Question: How do we define a “residual” in logistic regression, and how is it computed?
In Logistic Regression, the Deviance fills the same role as the residual sumo f squares in linear regression.
This is computed by calculating what is known as the likelihood-ratio test, Illustrated below:
D = - 2ln [(likelihood of the fitted model)/(likelihood of the saturated model)]
Model: Let’s consider the logistic regression model, which we will refer to as Model 1, given by
log(pi / [1-pi]) = 0.25 + 0.32*X1 + 0.70*X2 + 0.50*X3 (M1),
Where X3 is an indicator variable with X3 = 0 if the observation is from Group A and X3 = 1 if the observation is from Group B. The likelihood value for this fitted model on 100 observations is 0.0850.
Question: For X1 = 2 and X2 = 1 compute the log-odds for each group, i.e. X3 = 0 and X3 = 1.
Group A (X3=0);
Group B (X3=1);
Question: For X1=2 and X2=1 compute the odds for each group, i.e. X3 = 0 and X3 = 1.
Question: For X1 = 2 and X2 = 1 compute the probability of an event for each group, i.e. X3 = 0 and X3 = 1.
Question: Using the equation for M1, compute the relative odds associated with X3, i.e. the relative odds of Group B compared to Group A.
Question: Use the odds for each group to compute the relative odds of Group B to Group A. How does this number compare to the result in above. Does this make sense?