1. Assume that there is a population regression model
y = β0 + β1x1 + β2x2 + β3x3 + u
and that the model satisfies assumptions MLR1 through MLR5 in the population. Indicate, without explanation, whether the following statements are true or false
a. If you take a random sample from the population, and estimate an OLS regression with y as the dependent variable and x1, x2 and x3 as the independent variables, the estimated coefficients of x1, x2 and x3 will be unbiased estimates of β1 , β2, and β3.
b. If you take a random sample from the population, and estimate an OLS regression with y as the dependent variable and x1, x2 and x3 as the independent variables, the estimated coefficients of x1, x2 and x3 will be equal to β1 , β2, and β3.
c. b. If you take a random sample from the population, and estimate an OLS regression with y as the dependent variable and x1, x2 and x3 as the independent variables, the estimated coefficients of x1, x2 and x3 will be statistically significant.
d. If you take two random samples from the same population, and use each of the samples to estimate the population model using OLS, you will get the same β estimates from each regression. ^
e. If β1 is a positive number, then β1 (the OLS estimate of β1 that you would see on the Stata output after estimating a regression) may be a positive or a negative number.
2. Provide a short answer (one to three sentences) to each of the following questions. (It will sometimes help to think about the meaning and implications of the MLR assumptions for the properties of the OLS estimators, that is, whether the estimated β coefficients are unbiased, how efficient they are, and whether the OLS standard errors are biased or unbiased.)
a. Suppose you have a sample that tells you the life expectancy of a 60 year old male in each of the 50 states. It also tells you the average amount of education completed and the average level of income for people over 60 in that state. You are interested in using regression analysis to estimate the effect of education on life expectancy. Is it a good idea to also include the variable measuring average income in this regression? Discuss the costs and/or benefits of doing so.
b. Consider the following population regression model that determines the score on a standardized test for elementary school children
score = β0 + β1classiz + β2faminc + u
where classiz is the size of the student's class and faminc is annual income of the student's family. The expected value of u is the same across all levels of class size and family income, but the variance of u is different for different classrooms. Does this cause the OLS estimators for the β coefficients to be biased? Explain.
c. Consider the regression modelWhere cigs is daily cigarette consumption for an individual, price is price per pack of cigarettes in the local area, and income is the individual's annual income. Suppose we have a random sample of 500 people. For the purposes of estimating β1, would it be better if there was a lot of correlation in the sample between price and income, or only a little correlation?
d. Consider the population model estimated for the previous homework assignment:
lpfries = β0 + β1prpblck + β2lincome + β3prppov + u
where lpfries is the log of the price of a small order of fries, prpblck proportion of people living in the restaurant's zip code who are black, lincome is the log of median income in the zip code, and prppov is the proportion of people living in the restaurant's zip code who are below the poverty line. Obviously, lincome and prpov are highly correlated with one another. Does this violate one of the MLR assumptions?