Assignment -
The question concerns data from a case-control study of esophageal cancer in Ile- et-Vilaine, France. The data is distributed with R and may be obtained along with a description of the variables by: BETA REGRESSION 65
data(esoph)
help(esoph)
(a) Plot the proportion of cases against each predictor using the size of the point to indicate the number of subject as seen in Figure 2.7. Comment on the relationships seen in the plots.
(b) Fit a binomial GLM with interactions between all three predictors. Use AIC as a criterion to select a model using the step function. Which model is selected?
(c) All three factors are ordered and so special contrasts have been used appropriate for ordered factors involving linear, quadratic and cubic terms. Further simplification of the model may be possible by eliminating some of these terms. Use the unclass function to convert the factors to a numerical representation ?and check whether the model may be simplified.
(d) Use the summary output of the factor model to suggest a model that is slightly ?more complex than the linear model proposed in the previous question.
(e) Does your final model fit the data? Is the test you make accurate for this data?
(f) Check for outliers in your final model.
(g) What is the predicted effect of moving one category higher in alcohol consumption?
(h) Compute a 95% confidence interval for this predicted effect.
The dataset "discoveries" lists the numbers of "great" inventions and scientific discoveries in each year from 1860 to 1959.
(a) Plot the discoveries over time and comment on the trend, if any.
(b) Fit a Poisson response model with a constant term. Now compute the mean number of discoveries per year. What is the relationship between this mean ?and the coefficient seen in the model?
(c) Use the deviance from the model to check whether the model fits the data. ?What does this say about whether the rate of discoveries is constant over time?
(d) Make a table of how many years had zero, one, two, three, etc. discoveries. Collapse eight or more into a single category. Under an appropriate Poisson ?distribution, calculate the expected number of years with each number of discoveries. Plot the observed against the expected using a different plotting character to denote the number of discoveries. How well do they agree?
(e) Use the Pearson's Chi-squared test to check whether the observed numbers are consistent with the expected numbers. Interpret the result.
(f) Fit a Poisson response model that is quadratic in the year. Test for the significance of the quadratic term. What does this say about the presence of a trend in discovery?
(g) Compute the predicted number of discoveries each year and show these pre- dictions as a line drawn over the data. Comment on what you see.
The debt data arise from a large postal survey on the psychology of debt. The frequency of credit card use is a three-level factor ranging from never, through occasionally to regularly.
(a) Declare the response as an ordered factor and make a plot showing the rela- tionship to prodebt. Comment on the plot. Use a table or plot to display the relationship between the response and the income group.
(b) Fit a proportional odds model for credit card use with all the other variables as predictors. What are the two most significant predictors (largest t-values) and what is their qualitative effect on the response? What is the least significant predictor?
(c) Fitaproportionaloddsmodelusingonlytheleastsignificantpredictorfromthe previous model. What is the significance of this predictor in this small model? Are the conclusions regarding this predictor contradictory for the two models?
(d) Use stepwise AIC to select a smaller model than the full set of predictors. You will need to handle the missing values carefully. Report on the qualitative effect of the predictors in your chosen model. Can we conclude that the predictors that were dropped from the model have no relation to the response?
(e) Compute the median values of the predictors in your selected model. At these median values, contrast the predicted outcome probabilities for both smokers and nonsmokers.
(f) Fit a proportional hazards model to the same set of predictors and re-compute the two sets of probabilities from the previous question. Does it make a difference to use this type of model?