Assignment:
Instructions: Please answer all of the following questions as best as possible. If you have any questions please see me immediately. Partial credit will be awarded when it is deserved. The point value for each question is in parentheses. All sub questions are of equal value. This assignment is due February 18th.
1. (5) Would excluding variables that should belong to the model produce unbiased OLS estimators?
2. (10) If including an irrelevant variable still implies that the OLS estimators are unbiased, then why don't we just include as many variables as we can?
3. (25) Suppose I am interested in modeling how wages are determined. My model of interest is
log(wage) = β0 + β1educ + u, (1)
where wage is the hourly wage earned and educ is total years of education a worker has.
(a) Why do we expect β1 > 0?
(b) Should years of experience working be included in a model of wage determination (why or why not)?
(c) If years of experience working is excluded from my model, what is the likely affect on the bias of the OLS estimator of β1?
(d) Interpret β1 in the above model.
(e) If I inform you that years of education and years of work experience are correlated, does that mean that years of work experience should not be included in the model to eliminate collinearity (why or why not)?
(f) My estimated regression is
logdwageˆ = 0.284 + 0.073 · educ. (2)
Interpret the coefficient on educ from my estimated regression.
(g) R2 = 0.304 in this model. What does this mean?
(h) Would R2 decrease if I added years of experience as a regressor in my model?
(i) Would R2 decrease if I added the individuals height as a regressor in my model?(j) How would your interpretation of β1 change if you used log(educ) as a regressor instead of educ?
4. (60) Use the Boston dataset in the MASS package in R to answer the following questions. You may type? Boston in R to get complete definitions of the variables. The following model describes the median housing price (medv) across communities in the metro Boston area in terms of the amount of pollution (nox for nitrous oxide concentration) and the average number of rooms in a house in the community (rm):
log(medv) = β0 + β1 log(nox) + β2rm + ε. (3)
(a) What are the expected signs of β1 and β2 in this model?
(b) What is the interpretation of β1?
(c) What is the interpretation of β2?
(d) Estimate this model and report your coefficient estimates for the three parameters in this model along with R2 and the corresponding standard errors.
(e) Interpret R2
(f) Are you concerned that collinearity is present in this setting?
(g) Why would nox and rooms be negatively correlated?
(h) Would estimating your model of housing prices omitting rooms yield an upward or downward bias in βˆ1 if rooms and nox were negatively correlated, why?
(i) Estimate your model excluding rooms and report your coefficient estimates for the two parameters along with R2 and the corresponding standard errors.
(j) Is your estimate of β1 in the univariate model closer to the truth than in the multiple regressor model you estimated earlier?
(k) What do you make of the vast decrease in R2 when you estimate the univariate model?
(l) Is it possible to include rm2 in model (3)? If yes, why is it useful?