Q1. Consider the following simple linear regression models for data (x1, y1), (x2, y2), . . . (xn, yn):
M1: yi = β0 + β1xi + εi, i = 1, 2, . . . , n
M2: yi = α0 + α1(xi - x¯) + εi, i = 1, 2, . . . , n
where x¯ = n-1i=1∑nxi and the errors satisfy Gauss-Markov assumptions.
(a) By equating the right sides of the two model equations show that M1 and M2 are equivalent in the sense that you can express (β0, β1) as functions of (α0, α1) and vice versa.
(b) We can write model M2 in matrix form as
Write down the explicit form of X for this model and use it to obtain a formula for the least squares estimate αˆ. Simplify the formulas for αˆ0 and αˆ1 as much as possible.
(c) Show that var(αˆ) is a 2 x 2 diagonal matrix (a square matrix with zeros off the diagonals).
(d) Consider the estimated mean response for a new value at x = x∗, given by µˆ∗ = αˆ0 + αˆ1(x∗ - x¯). Show that
var(µˆ∗) = σ2{(1/n) + ((x∗ - x¯)2/i=1∑n(xi - x¯)2)}
Q2. For the prostate data set, fit a model with lpsa as the response, and the other variables as predictors.
(a) Suppose a new patient with the following values arrives:
lcavol = 1.45000, lweight = 3.59801, age = 63.00000, lbph = 0.30010, svi = 0.00000, lcp = -0.79851, gleason = 7.00000, pgg45 = 15.00000.
Predict the lpsa for this patient along with an appropriate 95% prediction interval.
(b) Now provide a 95% confidence interval for the mean of lpsa for a patient with the same covariate values as in part (a), and explain why the prediction interval is wider than the confidence interval.
(c) Repeat the questions in part (a) for a patient with the same covariate values except that his age is 20. Explain why this prediction interval is wider than that in part (a).
Q3. Using the happy data set, fit a model with happy as the response and all of the other four variables as predictors. Perform regression diagnostics on this model to answer the following questions. Display any plots that are relevant. Do not provide any plots about which you have nothing to say. Suggest possible improvements or corrections to the model where appropriate.
(a) Check the constant variance assumption for the errors.
(b) Check the normality assumption.
(c) Check for large leverage points.
(d) Check for outliers.
(e) Check for influential points.
4. Using the swiss data set, fit a model with Fertility as the response and all of the other variables as predictors. Answer the following.
(a) Produce a plot of the internally Studentized residuals ri versus the ordinary (least squares) residuals εˆi. (Show R code.)
(b) The points in this plot do not exactly fall on a straight line. Briefly explain why. [Hint: What is the formula for the internally Studentized residuals?]
(c) List the externally Studentized residuals ti (which are used as test statistics in the Mean Shift Test).
(d) Perform the Mean Shift Test without Bonferroni adjustment, using α = 0.05. Which provinces are identified as outliers?
(e) Perform the Mean Shift Test with Bonferroni adjustment, using α = 0.05. Which provinces are identified as outliers?