This assignment involves the analysis of a data set we have seen previously, which is various biomarkers in TB meningitis patients. The goals of the assignment are to develop some capability in carrying out and comparing different linear mixed effects models. The file is called Classfile_TBM_assignmetn2.csv and is available in the datasets folder on Vula. The data we are using today is biomarker data from a cohort of individuals co-infected with HIV and Tuberculous Meningitis (TBM). Although rare, acquiring TBM usually results in very poor outcomes. The paper from which is data is drawn is Marais et al.
Neutrophil-associated central nervous system inflammation in Tuberculous Meningitis Immune reconstitution inflammatory syndrome. Clin Infect Dis. 59 (2014).
A data frame of 34 individuals, each with observations at three time points. There is no missing data.
Variables:
- ID - individual ID
- Group - some individuals developed a condition called IRIS, some did not, this distinguishes the two, it remains in the data set, but we will not use it for this assignment.
- Time - 0, 2, 4, corresponding to baseline, ART initiation, 2 weeks post ART. Note that the data is in wide format, so you will need to create this variable. It is part of the column names.
- Potential clinical variables/confounders: BICd4 - CD4 count; BIHIVVL - HIV viral load; BMI - body mass index; BINa - blood sodium
- Outcome variable: CSFNeutrophils OR CSFLymphocytes
A number of analytes. The analytes were measured either in plasma (blood) or in CSF fluid, which is why the prefix of "CSF" or "BI" occurs. We will ONLY use the CSF measured values, and the BI measured analytes have already been removed (except where relevant). The analytes available are grouped at the end of the assignment. You will not use all of the analytes, but a selection of 4 (see details at end of assignment for how to do this).
This will result in an analysis data set of 1 outcome, 4 potential covariates/confounders (CD4, HIVVL, BMI, BINa) and 4 analytes (in CSF, varies by person).
Instructions: Answer the following questions in a TYPED report. Be sure to put your name/student number on the first page, use the same sequence and question numbers as below, and submit a pdf file only to Vula. When answering the questions you should be concise, complete and correct. With every table or figure you must provide an appropriate caption, and a brief [no more than 2 or 3 sentences] explanation of what results the table or figure presents. Tables should be formatted, and not pasted or screen shots from software. You will note that I request the same estimates multiple times. This is in fact just to make marking easier, but please be sure to provide what is asked for. If the request is vague - like "provide summary statistics" YOU must decide what summary statistics are appropriate.
Q1. To aid the grader, provide a table that indicates the analytes, and the outcome, that were used in YOUR analysis, and their summary statistics.
Q2. From your preliminary investigations, select one figure that effectively describes the data and provide it, with an appropriate caption.
Q3. Fit the following models, using time as a continuous, linear covariate, and ensuring that covariates are scaled:
(a) linear regression model
(b) random intercepts model (on ID)
(c) random slopes model (on ID)
(d) random slopes and random intercepts model (on ID)
Provide a table that presents the coefficient estimates, 95% confidence intervals for estimates of the fixed effects and all estimates of the variance of the random effects for all of the models to enable a direct comparison. Do not provide p-values.
Note: You will have to decide which of the clinical parameters should be included in this model. Be sure to indicate what is included in your explanation of these results, and fit the same covariate model for all components. Although clinical covariates are available at all time points, you may choose to fit them as time invariant by using just the baseline observations. All of your (4) analytes should be included.
Q4. Fit model (3a) and (3c) with and without scaled covariates/analytes. Provide coefficient estimates and CI intervals together in a single table, as well as variance estimates.
Q5. Plot the random effects estimates (forest plot) for models (3c) and (3d) and provide with an appropriate caption.
Q6. Fit model (3c) with a different function of time. Report the estimates here in a single table. Make sure the caption indicates how time went into the model.
Q7. For each of the random effects models you have fit (in Q3 and Q6, 4 in total), report the proportion of variance that is attributable to the random effect.
Q8. Using a maximum of 300 words, compare and contrast the models you have fit, with attention to the estimated effects, the variation between the random effects estimates, the impact of changing how time is modeled and the impact of scaling the covariates.
How to select YOUR analytes:
Outcome variable: Toss a coin, heads you use CSFNeutrophils, tails you use CSFLymphocytes
Analytes: Use: TNFalpha
Select 1 from Grp A: IL22, IL18, INFgamma, IL1beta, IL2, MIP1beta, MIP1alpha , LL37, IL1beta, C5a, HNP13, TIMP2
Select 2 from Grp B: IP10, MMPs (all of them), MIP2, IL8, IL6, MCP1, IFNalpha2, IL10, IL12p40, TIMP1, IL17, MMP9: TIMP1
Select these independently without discussion with your colleagues. You do not need to justify your selection.
Answer this assignment using STATA.