A retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa. There are roughly two controls per case of CHD. Many of the CHD positive men have undergone blood pressure reduction treatment and other programs to reduce their risk factors after their CHD event. In some cases the measurements were made after these treatments. These data are taken from a larger dataset, described in Rousseauw et al, 1983, South African Medical Journal.
There are 463 observations in the dataset. The variables in the dataset are :
sbp - systolic blood pressure
tobacco - cumulative tobacco (kg)
ldl - low density lipoprotein cholesterol adiposity
famhist - family history of heart disease (Present, Absent)
typea - type-A-behavior obesity
alcohol - current alcohol consumption
age - age at onset
chd - response, coronary heart diseease
If you would prefer to analyze this data in using some other statistical package, you will need to export the data from R using something like a write.table command (or some variation thereof).
The following questions are of practical interest:
1. What are significant predictors of CHD? What would a final model look like and can you provide an estimate of its predictive accuracy (i.e. do model selection and then evaluate predictive accuracy) ? What functional forms are most appropriate for the various predictors in your final model ?
2. Since high ldl often precedes a diagnosis of CHD, will a two stage model which first uses ldl as a response in stage 1 and then CHD as a response in stage 2, provide more accurate predictions of CHD than the model built question 1 above ?
3. There are often situations where finding just one obviously best submodel is dicult. There may be many good competing sub-models.
However, you might decide to bring together multiple models to improve predictive performance. Develop a strategy for doing this on this dataset, being careful to clearly compare and contrast (to the single model approach) predictive performance. Also, make sure to clearly motivate your strategy giving enough intuition so that I can follow things easily.
Please provide complete justifications for why you chose a particular modeling strategy including the underlying assumptions you are making. Analyze the data and provide some overall inferences with regards to the questions being posed. Write a report that details your analysis.