Problem -
For this problem we follow the study by Lalonde (1986) on earnings related to various factors such as training, race, gender, etc. The data are available directly from the DAAG package. Your task will be to fit a multiple regression model of the form y^ = β0 + β1x1 + β2x2 + β3x3 + ···, where y^ = real earnings in 1978 (re78), and for the predictors, you will need to decide which ones to keep. The complete assignment needs to be typed, include all the plots, and the R source code as well.
(a) Plot a histogram of each variable and discuss its properties.
(b) Estimate your full regression model y^ = β0 + β1x1 + β2x2 + β3x3 + · · · and show your results (e.g., the output from the 'summary' command in R). Discuss these results.
(c) Compute the Mallows CP statistic for all the plausible models and choose only one model. Discuss why you chose this model. For the next questions, only use your selected model from this part.
(d) Plot the residuals vs. the fitted values.
(e) Plot and discuss the VIF plot.
(f) Plot and discuss the correlation graph (use the corrplot library)
(g) Plot the Cook's Distance values. Are there any outliers? If so, discuss what you would do with them.
(h) Plot a histogram of the residuals and discuss the results.
(i) Plot the QQ Normal Plot and discuss the results.
(j) Plot the observed vs. predicted values, overlay a Lowess smoother, and discuss the results.