Refer to the Prostate cancer data set in Appendix C.5 and Case-Study 9.30. Select a random sample of 65 observations to use as the model-building data set.
a. Develop a regression tree for predicting PSA. Justify your choice of number of regions (tree size), and interpret your regression tree.
b. Assess your model's ability to predict and discuss its usefulness to the oncologists.
c. Compare the performance of your regression tree model with that of the best regression model obtained in Case Study 9.30. Which model is more easily interpreted and why?
Case-Study 9.30
Refer to the Prostate cancer data set in Appendix C5. Serum prostate-specific antigen (PSA) was determined in 97 men with advanced prostate cancer. PSA is a well-established screening test for prostate cancer and the oncologists wanted to examine the correlation between level of PSA and a number of clinical measures for men who were about to undergo radical prostatectomy. The measures are cancer volume, prostate weight. patient age, the amount of benign prostatic hyperplasia. seminal vesicle invasion, capsular penetration, and Gleason score. Select a random sample of 65 observations to use as the model-building data set. Develop a best subset model for predicting PSA. Justify your choice of model. Assess your model's ability to predict and discuss its usefulness to the oncologists.
Appendix C5
A university medical center urology group was interested in the association between prostate-specific antigen (PSA) and a number of prognostic clinical measurements in men with advanced prostate cancer. Data were collected on 97 men who were about to undergo radical prostectomies. Each line of the data set has an identification number and provides information on 8 other variables for each person. The 9 variables are: