Assignment - You have to use "R Software"
data(kyphosis,package="rpart")
Consult the help page on the data for further details.
(a) Make plots of the response as it relates to each of the three predictors. You may find a jittered scatterplot more effective than the interleaved histogram for a dataset of this size. Comment on how the predictors appear to be related to the response.
(b) Fit a GLM with the kyphosis indicator as the response and the other three variables as predictors. Plot the deviance residuals against the fitted values. What can be concluded from this plot?
(c) Produce a binned residual plot as described in the text. You will need to select an appropriate amount of binning. Comment on the plot.
(d) Plot the residuals against the Start predictor, using binning as appropriate. Comment on the plot.
(e) Produce a normal QQ plot for the residuals. Interpret the plot.
(f) Make a plot of the leverages. Interpret the plot.
(g) Check the goodness of fit for this model. Create a plot like Figure 2.9. Com- pute the Hosmer-Lemeshow statistic and associated p-value. What do you con- clude?
(h) Use the model to classify the subjects into predicted outcomes using a 0.5 cutoff. Produce cross-tabulation of these predicted outcomes with the actual outcomes. When kyphosis is actually present, what is the probability that this model would predict a present outcome? What is the name for this characteristic of the test?