Assignment:
Evaluate how the Infection Risk (Y) is associated with the following 4 variables.
1. Length of stay
2. Routine culturing ratio
3. Region (1=NE, 2=NC, 3=S, 4=W)
4. Average daily census
1.Use appropriate graphs to summarize the data.
2.Fit a first-order model using the above independent variables. Note that "Region" is a categorical variable. Use appropriate coding method that can help you in the next part.
Interpret the parameters in the context of the problem. Please solve in R and show code
2. Fit a first-order model using the above independent variables. Note that "Region"is acategorical variable. Use appropriate coding method that can help you in the next part. Interpret the parameters in the context of the problem.
3.Examine whether the Infection Risk of hospitals in the NE (Region 1 = NE) region differs from that of hospitals located in the other three regions. This can be done by finding the appropriate confidence intervals for pair - wise comparisons for (NE vs. NC), (NE vs. S), and (NE vs. W), after adjusted to the other variables. Use the Bonferroni procedure with a 90 % family confidence coefficient. Summarize your findings.
4.Regardless of your findings in part (3), if all of the confidence intervals in part (3) include 0, can we conclude that the predictor "Region" is not statistically significant? Briefly explain why.
5. It is suggested that the effect of the average number of patients in the hospital (i.e., average daily census) may vary with different values of length of stay and routine culturing ratio.
Add appropriate terms to the model in (2) so that you can verify whether it is statistically true.
6.Between the models in (2) and (5), which one will you recommend? Use partial F- test, AIC, BIC and PRESS to justify your answer.
7. Plot the residuals from part (5) so that you can evaluate if the variance of the residuals changes across different Regions. State your findings and comment.
8. Regardless your findings in part (7), what would you do in the following hypothetical scenarios?
(Only describe your approach. No computation is needed.)
i.The scatter plot between residual and regions show signs of non-constant error variance. In addition, other regression diagnostics show indications that the effects of the numerical independent variables may be different at different regions
ii.The scatter plot between residual and regions show signs of non-constant error variance,but there are no signs of interaction between region and other variables.
9.Conduct other regression diagnostics to evaluate the model and the data. Revise the model using appropriate remedial measures if necessary. If some approaches are unable to be conducted due to the limitation of the software, briefly explain the purpose and steps of the approach with out computation.