Part A: High School Data
Use the data highschool.csv to answer the following questions. The variables are summarized in the following table:
Variable - Description
math - Math test scores from a test administered in the spring of 10th grade
std.test - State test scores from 8th grade
schoolid - ID for the school that the student attends
attendance - Number of days present, mean centered
SES - 4 levels of SES: 1 = very low, 2 = low, 3 = medium, 4 = high
sch.type - 2 different types of school: 1 = public, 2 = private
1. Suppose that we are interested in the impact of school type and SES on math scores.
(a) Make an interaction plot. Comment on the observed pattern.
(b) (Write the statistical model you would like to fit. Define all the symbols.
(c) Fit the model and evaluate whether the main effects and interactions are significant. Summarize your findings and include the relevant test statistics and p-values. [Hint: Be careful with the type of sum of squares (SS) decomposition to use.]
(d) Checking the assumptions of the model you just fitted. Report your findings.
(e) Test pairwise comparison contrasts among all SES levels within public schools. Adjust for family-wise error rate (FWER).
(f) Test the interaction contrast involving SES = medium and high and the two types of schools.
2. We are also interested in investigating whether the 8th grade state test scores can be used as a covariate to adjust the previous analysis. [Note: You should have created the combined factor for school type and SES (with 8 levels representing all 8 combinations of school type and SES) in order to test contrasts in the previous question. It is enough to use the combined factor for the current question.]
(a) Is the 8th grade state test score related to the outcome variable? Provide statistical evidence to support your conclusion.
(b) Write the statistical model you plan to fit. Define all the symbols.
(c) Check the assumptions of the model you plan to fit. Comment on your findings.
(d) Based on the assumption check in the previous question, fit an appropriate model. Interpret the coefficient estimates and the SS decomposition table.
(e) Re-test the contrasts in Question 1(e) based on the final model. Adjust for FWER. Do you see any differences?
Part B: Orthodontics data
Investigators at the University of North Carolina Dental School followed the growth of 27 children (identified by variable Subject) from age 8 until 14. Every two years they measured the distance between the pituitary and the pterygomaxillary fissure (abbreviated as PP distance), two points that are easily identified on X-ray exposures of the side of the head. It was of interest to use repeated-measures ANOVA to study how the PP distance changes over time.
3. SS decompesition and sphericity
(a) Express the 'model in the context of the orthodontics data.
(b) Obtain the SS decomposition table for the model. Report the F-test to determine whether the PP distance change significantly as the child grows.
(c) State and test the sphericity assumption of the repeated-measure ANOVA model.
4. Point estimation of fixed effects and variance components.
(a) Report the estimates of average PP distance at each age. Produce a graph to visualize the growth of average PP distance across age.
(b) What is the variance estimate of the person effect?
(c) What is the definition and interpretation of the intraclass correlation (ICC) coefficient? What is the estimate of ICC in the current analysis?
5. Testing contrasts.
(a) Write a contrast to compare the average PP distance at age 10 or before versus alter age 10. Is the contrast statistically significant?
(b) Test the linear, quadratic, and cubic components for the growth of the PP distance across age. Adjust for FWER. Interpret your findings.
(c) Based on the statistical significance of polynomial components you just performed, fit a linear mixed-effects model. Compare with the repeated-measure ANOVA model, does this new model fit significantly worse?
Attachment:- Assignment Files.rar