PROJECT -
The Human Immunodeficiency Virus (HIV) causes AIDS by reducing the patient's ability to fight infection. The CD4+ cell count (per milliliter of blood) is a biomarker that measures the body's immunoresponse to infectious agents. CD4+ cells decrease in number with time and the depletion is often used to monitor the progression of the disease in infected individuals.
The goal of this project is to:
(i) describe the average time course of CD4+ cell counts in HIV infected subjects before and after seroconversion;
(ii) evaluate the potential effect of age on the CD4+ cell changes, and
(iii) predict the time course of CD4+ cell counts for specific HIV infected individuals taking into account the measurement error in CD4+ cell determination.
The data to answer these research questions are contained in the file CD4.txt. This data set contains the following variables: field 1: ID (subject's Identification) field 2: CD4 (subject's CD4+ cells counts) field 3: Time (time since seroconversion which starts at 0, negative times refer to observations obtained prior to seroconversion) field 4: Age (age at enrollment relative to an arbitrary origin).
More details about this data set can be found in the book: Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data. (2nd edition). Oxford: Oxford University Press.
(a) Perform some exploratory analyses to see what might be a plausible model for the mean CD4+ cell counts as a function of time since seroconversion and baseline Age.
(b) Because the investigators had not taken a course in longitudinal analysis, they were unaware that measurements on the same patient might be correlated. They fit an independent regression model treating observations from the same patient as if they were unrelated. Based on this naive analysis, describe the average time course of the CD4+ cell counts. Estimate the rate of CD4+ cell depletion two years after seroconversion for an average patient of a median age and construct the associated confidence interval. (Note: I expect you to reduce the mean model to a parsimonious one before interpreting the results. For example, you may evaluate whether the average profile varies with the baseline Age.)
(c) One of the investigators then talked to a friend who knew something about repeated measurements, who suggested that the analysis in (b) may be unreliable because possible correlation had not been taken into account. Give a brief explanation of why failure to take correlation into account might lead to biased inferences.
(d) Because you have taken a course in longitudinal data analysis, the investigators called you in for help for an improved analysis. Generate some residuals to study the variance function and the correlation function in these data. Based on your overall investigation, propose a possible random effects structure as well as a structure for the error term(s) that may be considered for these data. Note: do not fit the model.
(e) Fit a linear mixed effects model using the mean model proposed in a) and the random components (random effects and random errors) in (d). Under the formulated model, do you think a simpler structure for random effects and within-subject errors is plausible? Specifically, test for the need of a serial correlation if you had considered one and reduce the dimensionality of the random effects. If the selected random effects and serial errors are not enough to model the association in the data, what approach can be used to ensure that the association in the data is handled properly?
(f) Consider your final (reduced) random effects and/or serial error terms obtained in e).
(f1) Reduce the mean structure to a more parsimonious one. Based on this analysis, describe the time course of average CD4+ cell counts. Is there sufficient evidence to suggest that it is worthwhile to take the baseline Age into account to understand the average time course of CD4+ cell counts in these HIV infected individuals? Plot the estimated average CD4+ cell counts for subjects with age fixed at the first quartile, second (median) and third quartile.
(f2) Based on your final linear mixed effects model, which estimated average CD4+ cell counts would be more appropriate to predict the CD4+ cell count profile when counseling a particular HIV infected person. Compute and plot these estimates for subject ID 10088, add the observed profile to the plot.
(f3) Hypothetically, suppose that an HIV infected individual is diagnosed with AIDS when his subject-specific average CD4 count is below 200 cells/mm3. Based on your analysis, how many individuals would be diagnosed with AIDS in this dataset with this definition?
(g) Summarize your overall findings. What can you conclude about the average time course of CD4+ cell counts from these data?
Please do analysis with SAS.
Attachment:- Assignment Files.rar