Question 1: Do you suppose that your dataset has any problem with heteroskedasticity? How would you detect a problem with non-constant variance in the error terms of your model? What, if anything, might be done to adjust for the presence of heteroskedasticity as you read the results of your models?
Question 2: Do you suppose that your independent variables are multicollinear with each other? Wooldridge explained to us that multicollinearity – relationships between explanatory variables themselves – was a violation of the assumptions of Ordinary Least Squares regression techniques that we have been employing. He also observed that, sadly, most social science and business type datasets – being derived by historical observational means, rather than by true random sampling – have high degrees of interlocking correlations between variables. Does this mean that we should abandon our attempts at measurement, modeling, and statistical testing of social science data? What if anything would you do to explain and qualify the results of the modeling in your report to account for possible multicollinearity? How will you make this understandable to the reader of your report?
Question 3: Which is the bigger problem: (1) leaving out important variables that should have been included, or (2) putting in variables that should have been excluded? In the on-going theoretical consideration of under-specification versus over-specification, what is the best approach to take? Explain. What is better – to have a richer dataset, one with more and different variables – or to have a bigger dataset, one with more observations and degrees of freedom? Explain.
Question 4: Consider the intercept term in your models. The intercept term (B0) is the value of the dependent variable (y) when the values of all of the independent (x) variables are equal to zero. If you encounter the argument that, since the x variables never actually equal zero, we don't need to know the value of the intercept term or its statistical significance, how would you respond? What happens when you specify a model without an intercept term?:
y = B1x1 + . . . + Bnxn + u
{Hint: do this with one of your models. Run an OLS regression without the constant term, and see how it changed your results.}
Question 5: If you had more time, more research budget, and a cadre of dedicated and brilliant economics graduate students, where would you take your project? What sorts of new variables and new data would you go after? What other types of specifications would you consider running? What new frontiers of knowledge and new questions could be explored by such an expansion of the basic project you conducted and the models which you built? Explain.