The Skin Cancer prevention study dataset can be found at the Fitzmaurice website, linked to the book Applied longitudinal analysis Garret, Fitzmaurice.
Please I need answers for the longitudinal analysis of that dataset from a) to f).
Consider data from the Skin Cancer Prevention Study. This study is a randomized, double-blind, placebo-controlled clinical trial of beta-carotene to prevent non-melanoma skin cancer in high risk subjects. A total of 1805 subjects were randomized to either placebo or 50mg of beta-carotene per day for 5 years. Subjects were examined once a year and biopsied if a cancer was suspected to determine the number of new skin cancers occurring since the last exam. The outcome variable is a count of the number of new skin cancers per year. The outcome variable Y is a count of the number of new skin cancers per year. The categorical variable Treatment is coded 1=beta-carotene, 0=placebo. The variable Year denotes the year of follow-up. The categorical variable Gender is coded 1=male, 0=female. The categorical variable Skin denotes skin type and is coded 1=burns, 0=otherwise. The variable Exposure is a count of the number of previous skin cancers. The variable Age is the age (in years) of each subject at randomization. See the attached dataset.
(a). Describe the pattern of missing data for this patient population
(b). Estimate the adjusted effect of the beta-carotene treatment on the occurrence of skin cancer (i.e Y = 0 vs. Y > 0) over the 5 year period using the generalized estimating equations model. Compare the estimate of the treatment effect for different missing data strategies (i.e., complete case, mean, EM, and multiple imputations).
(c). Estimate the adjusted effect of the beta-carotene treatment on the occurrence of skin cancer (i.e Y = 0 vs. Y > 0) over the 5 year period using the conditional mixed-effects model. Compare the estimate of the treatment effect for different missing data strategies (i.e., complete case, mean, EM, and multiple imputations).
(d). Estimate the adjusted effect of the beta-carotene treatment on the number of new skin cancers (i.e., Y = 0 vs. 1 vs. 2 vs. 3) over the 5 year period using a proportional odds generalized estimating equations model. Compare the estimate of the treatment effect for different missing data strategies (i.e., complete case, mean, EM, and multiple imputations).
(e). Estimate the adjusted effect of the beta-carotene treatment on the occurrence of new skin cancers (i.e., Y = 0 vs. Y = > 0) over the 5 year period using the conditional ordinal logistic mixed-effects model. Compare the estimate of the treatment effect for different missing data strategies (i.e., complete case, mean, EM, and multiple imputations).
(f) Based on your findings in questions b - e, which of the two approaches (conditional versus marginal) would consider as the most appropriate for these analyses. Why?