Text Book: Practical Econometrics: Data Collection, Analysis and Application.
Chapter 7- Model Selection: Criteria and Tests.
1. What are the reasons for the occurrence of specification errors?
2. What are the attributes of a "good" econometric model?
3. What are different types of specification errors? Can one or more of these errors occur simultaneously?
4. What are the consequences of including irrelevant variables in a model?
5. Omitting a relevant variable(s) from a model is more dangerous than including an irrelevant variable(s). Do you agree? Why or why not?
6. In looking for the simple Keynesian multiplier, you regress the GNP on investment and find that there is some relationship. Now, thinking that it cannot hurt much, you include the "irrelevant" variable "state and local taxes." To your surprise, the investment variable loses its significance. How can an irrelevant variable do this?
Chapter 10- Time Series Analysis.
1. Suppose you estimate the following distributed lag model with the independent variable lagged three times:
Yt = βo + β1x1,t + β2x1,t-1 + β3x1,t-2 + β4x1,t-3 + ε1
a. Assume that there is a random shock of size c to x1,t, such that the shock was temporary (affected x1,t., and then went away). Assume that no other shocks occur. What affect will this have on the yt, variable in the current period? Next period? Two periods from now? Three periods from now? Four periods from now? Five periods from now?
b. Now assume that the shock is permanent. How does this affect y, in the current period? Next period? Two periods from now? Three periods for now? Four periods from now? Five periods from now?
2. How do the time-series assumptions differ from the multiple linear-regression assumptions'? Why are they different?
3. Suppose you thought your data have a trend.
a. Describe how you would correct for it within the regression equation.
b. What transformation would you apply to the data to account for the trend?
4. Explain how a regression of yt, on xt, could result in a statistically significant relationship between the two variables when they are not actually related. What is this type of relationship called?
5. What is out-of-sample prediction? Say you had 500 time-series observations and would like to determine how well your model is performing. How would you suggest obtaining an out-of-sample prediction?
6. Use the runs test to test for autocorrelation in the following cases.
Sample size
|
Number of +
|
Number of -
|
Number of runs
|
Autocorrelation (?)
|
18
|
11
|
7
|
2
|
-
|
30
|
15
|
15
|
24
|
-
|
38
|
20
|
18
|
6
|
-
|
15
|
8
|
7
|
4
|
-
|
10
|
5
|
5
|
1
|
-
|
7. Monte Carlo experiment. Consider the following model:
Yt = 1.0 + 0.9Xt + ut (1)
Where X takes values of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Assume that
Ut= ρut-1+vt
=0.9 ut-1+vt (2)
Where vt ~ N (0, 1). Assume that u0 = 0.
a. Generate 10 values of vt and then 10 values of ut per Equation (2).
b. Using the 10 X values and the 10 u values generated in the preceding step, generate 10 values of Y.
c. Regress the Y values generated in part (b) on the 10 X values, obtaining b1 and b2.
d. How do the computed bi and b2 compare with the true values of 1 and 0.9, respectively?
e. What can you conclude from this experiment?
8. Continue with Problem 7. Now assume that p = 0.1 and repeat the exercise. What do you observe? What general conclusion can you draw from Problems 7 and 83?
Chapter 14- Instrumental Variables for Simultaneous Equations, Endogenous Independent Variables, and Measurement Error.
1. The following two equations describe property crime and police expenditures across U.S. cities:
Crimei = α0 + α1Palicei + α2Incomei + u1·t
Policei = β0 + β1Crimei + β2Incomei + β3Liberali + u2·1
Crime, = Number of crimes committed in city i last year
Policei = Total spending on the police force in city i last year
Incomei = Average income in city i last year
Liberali = Share of votes received by the liberal presidential candidate from city I in the last election
a. If you were to estimate these two equations with 01.S. would you get consistent estimates'? Why or why not?
b. Which of the two equations is fully identified? How do you know this?
c. Describe how you would estimate one or both of the equations using 2SLS.
Describe the properties of the estimates in both equations.
d. Come up with another explanatory variable that you could add to one of these equations to make the system fully identified. Justify the inclusion of this variable.
e. Discuss practical and theoretical drawbacks of using 2SLS.
2. A public health researcher is trying to estimate the determinants of fertility rates in developing countries. She proposes the following model:
FRi = β0+ β1Agei + β2Educationi + β3Urbani + ui
Where FRi is the fertility rate for individual i (measured as the number of children ever born), regressed on their age, their years of education, and whether or not they live in an urban area.
a. Another researcher argues that the Education variable suffers from a simultaneity bias with fertility rates. Identify the consequences of including an endogenous explanatory variable and briefly outline the steps you would take to consistently estimate this model.
b. This study used two different methods to measure fertility rates: one for rural areas and one for urban areas. Each method generated some measurement error, though the rural methodology was quite a hit less accurate. Given this fact, what are the consequences of measurement error in this sample? What, if anything, can you do to remedy the potential problems associated with measurement error on the fertility variable? Be as specific as possible.
c. Consider your answer to pan (h), and assume fertility rates no longer suffer from measurement error, but the education variable does suffer from measurement error. Also assume you have two different measures of education: one self-reported and the other obtained by asking the individual's closest relative. Given this fact, what are the consequences of measurement error in this sample? What, if anything, can you do to remedy the potential problems associated with measurement error on the education variable? Be as specific as possible.
3. The following two equations describe the interactions between fertility rates and average income of women in a cross-section of countries:
Fertility1, - α0+ α1lncomei + α2Educationi + α3Rurali + u1,
Incomei = β0 + β1Fertilityi + β2 Educationi + u2
Where the fertility rate (measured by average number of births per woman in country i) is a function of average female income in that country, the average years of female education, and the fraction of country is population that lives in rural areas, and female income is a function of the fertility rate and female education.
a. If you were to estimate these two equations separately with OLS, would you get consistent parameter estimates? Explain why or why not.
b. Which of the two equations is fully identified? Explain why.
c. Describe how you would consistently estimate the identified equation.
d. Come up with another explanatory variable, Zi that you could add to this system so that both equations can be estimated. What two conditions must Z1 satisfy? Explain why the variable you selected meets these two criteria.
4. Use the data in Educadion.xlsx to run an instrumental variable regression.
a. First estimate the model using OLS. Regress logwage on Experience, Occupation (equals 1 if blue collar occupation and 0 if white collar), Industry (equals I if manufacturing industry and 0 if not). Married (equals 1 if married and 0 if not). Union (equals 1 if in a union and 0 if not), Education (years of education), and Black (equals 1 if black and 0 if not).
b. Now use South (equals I if live in south and 0 if not) as an instrument for education. Run the first-stage regression by regressing Education on South and all other independent variables and get the predicted value for education. Run the second-stage regression by regressing logwage on the predicted value of Education, Experience, Occupation, Industry, Married. Union and Black.
c. Comment on if you think that the dummy variable South is a valid instrument: that is, is South correlated with years of Education and uncorrelated with the error term in the regression in part (a)?
5. Use the data in Demand.xlsx to run an instrumental variable regression.
a. First estimate the demand equation model using OLS. Regress Quantity on Price and Consumer Confidence.
b. Now use Milk feed price as an instrument for Price. Run the first- and second-stage regressions. Does your regression differ from what you found in part (a)?
c. Draw a picture that shows why it is important to use a supply shifter when estimating a demand equation.
6. Use the data in Measurement_Error.xlsx to correct for measurement error.
a. First estimate the model using OLS. Regress logwage on Experience, Occupation (equals 1 if blue collar occupation and 0 if white collar), Industry (equals 1 if manufacturing industry and 0 if not). Married (equals I if married and 0 if not). Union (equals 1 if in a union and 0 if not). Education (years of education), and Black (equals 1 if black and 0 if not).
b. Now use spouse experience as an instrument for experience to correct for the fact that experience may be measured with error. Run the first-stage regression by regressing Education on Spruce Experience and all other independent variables and get the predicted value for Experience. Run the second-stage regression by regressing logwage on the predicted value of Experience, Education, Occupation, Industry, Married, Union and Black.
Chapter 15- Quantile Regression, Count Data, Sample Selection Bias and Quasi Experimental Methods.
1. Explain how the following situations suffer from sample selection bias.
a. Data on the salaries of people in the labor force.
b. Telephone survey respondents.
c. Number of years of education.
d. Drug trials for a lethal disease.
2. Compare and contrast difference-in-differences estimation to panel data techniques.
3. Given how difficult it is to estimate true marginal effects in economics, why aren't quasi-experimental techniques used more often?
4. Use the file California.xlsx for the following questions.
a. Regress Loan Amount on income, black, other race, and male. Comment on the results using OLS.
b. Estimate the same model using quantile regression at the 50th percentile. How do your results in parts (a) and (b) differ?
c. Estimate the model at the 10th, 25th, 75th and 90th percentiles. Do these results differ from the results you found in pans (a) and (b)?
d. From a policy perspective, why do you think it is important to not just focus on average values but also consider the marginal values at different points in the distribution?
5. Use the file Hunting_Data.xlsx which is collected from the 2006 National Survey of Fishing, Hunting and Wildlife Associated Recreation, for the following questions.
a. Regress Hunt Days on male married, divorced, somecol, hsonly, ba, somepost, postgrad, Black, other, income and south central. Comment on the results. Why is OLS not the appropriate estimation strategy in this circumstance?
b. Estimate the same model as in part (a) with a Poisson model. Comment on the results.
c. Estimate the model with a negative binomial model. Test for over-dispersion. Do your results from parts (h) and (c) differ?
d. Which model do you think is the most appropriate to use, and why?
6. Using the file Suicide.xlsx which is collected from the National Center for -Health Statistic's 1998 mortality data, estimate a difference-in-differences model to estimate the effects of a policy change in the assisted suicide law in Oregon in 1997 in two ways.
a. First, estimate the sample mean of the number of suicides in Oregon in 1996 and the number of suicides in Washington in 1996, and then subtract the Washington mean from the Oregon mean. Second, estimate the sample mean of the number of suicides in Oregon in 199K and the number or suicides in Washington in 1998, and then subtract the Washington mean from the Oregon mean. Finally, subtract the value from the first step from the value in the second step. This is the difference-in-differences estimator. Comment on the results.
b. Estimate the difference-in-differences estimator using a regression. Regress the number of suicides in Oregon. The year 1998 and Oregon x 1998. Verify that the difference-in-differences estimator is the same as what you found in pan (a). Comment on the results.
c. From a policy perspective, why is the method employed in part (b) preferable to the method employed in part (a)?
d. Explain how this method does a better job of estimating the true effects of the policy change in Oregon than if only the 1998 data had been used.
Attachment:- Attachment.rar