1. Suppose we want to estimate the effects of a community's air quality on the community's median housing price (price), using data from a large set of communities in New York. Our key variable of interest is the amount of nitrogen oxide in the air (nox). To control for other factors that may affect the housing price, we also include various community characteristics related to housing: rooms is the average number of rooms in houses in the community, and stratio is the average student-teacher ratio of schools in the community. The regression model is:
price = β0 + β1 nox + β2rooms + β3 stratio+ u (1)
For each of the following cases, discuss if there could be any threat to internal validity of the regression analysis. If so, how will the estimate of β1 for the key variable of interest (nox) be biased? If you want to make any assumptions in your discussion, please just state and justify them.
a. Suppose the data for the amount of nitrogen oxide in the community is collected once a month (i.e. for each month, the level of the nitrogen oxide in the community is based on the data collected on the 15th of that month). We then calculate the average of the monthly data over the course of the year and use it as our measurement of noxfor that community.
b. In the survey data, the median housing price (price) is rounded to the nearest $1000.
c. In the survey data, the student-teacher ratio (stratio) is rounded off to the nearest integer (i.e. if there are 26.7 students per teacher on average, this is reported as 27).
d. According to previous housing studies, the housing price decreases at a faster rate as air pollution becomes more severe.
e. Suppose we have addressed all the issues above - could there be any other threats? Describe at least one.
2. An "Empowerment Zone" (EZ) is a designated neighborhood identified as needing economic development based on some negative characteristics. Businesses and individuals residing in EZs have some particular opportunities provided by the government (subsidies, etc.) designed to help those areas improve. A researcher was interested in whether having a city designated with an EZ improves employment; in particular, she asked the question, "Does having an EZ in a city reduce the number of unemployment claims that are made there?" (For details, see Papke, Leslie (1994) Tax policy and urban development: Evidence from the Indiana enterprise zone program. Journal of Public Economics 54: 37-49.)
a. Use a limited version of her data, called "ez_panel_2yrs," which contains information for 22 cities in 1980 and 1988, to estimate the relationship between unemployment claims (Y) and the existence of an EZ in the city (X, a dummy variable). (Do not do anything to account for year differences - just treat all the data as one big sample...we will modify this later.) Show the table of results here.
b. What do the results suggest about the relationship between an EZ and local unemployment claims?
c. Why might it be difficult to make a causal claim using this analysis? Discuss the anticipated direction of the bias.
d. Now estimate a before-and-after model using the same data. This will require you to create a new dependent variable, "change in unemployment claims," and a new independent variable, "change in EZ status" (which will take values of -1 if EZ is lost over the time period, 0 if it stays the same (whether there is one or not), and 1 if an EZ is added). You may want to look at the textbook example about fatality rates and beer taxes for more information on how this works. Since there may be an overall trend in unemployment claims, include an intercept in your model. Show the table of results here.
e. What do the results of your before-and-after model suggest about the relationship between an EZ and local unemployment claims?
f. What does this new model handle better than the initial model - in other words, why might we be more confident in the validity of this new estimate?
g. Papke's data is from cities in Indiana from 1980 - 1988. Discuss the external validity issues that may arise in discussion of her findings (be specific).