1. The data file SleepData.dat consists of 706 observations from the study done by Biddle and Hamermesh (1990) on the tradeoff between time spent sleeping and working (as well as other factors affecting sleeping). Your task will be to fit a multiple regression model of the form yˆ = β0 + β1x1 + β2x2 + β3x3 + ···, where yˆ = number of minutes of sleep at night per week (sleep), x1 = total number of minutes worked per week (totwork), x2 = years of schooling (educ), and x3 = age in years (age).

The complete assignment needs to be typed, include all the plots, and the R source code as well.

(a) Plot a histogram of each variable but make sure to include all four plots in the same window.

(b) Estimate your full regression model yˆ = β0 + β1x1 + β2x2 + β3x3 + ··· and show your results (e.g., the output from the 'summary' command in R). Discuss these results.

(c) Plot the residuals vs. the fitted values.

(d) Plot the Cook's Distance values. Are there any outliers? If so, discuss what you would do with them.

(e) Plot a histogram of the residuals and discuss the results.

(f) Plot the QQ Normal Plot and discuss the results.

(g) Plot the observed vs. predicted values, overlay a Lowess smoother, and discuss the results.

(h) Compute the Mallows CP statistic for all the plausible models and choose only one model. Discuss why you chose this model.

