Question 1:
In a study relating college grade point average to time spent in various activities, students are asked how many hours they spend each week in four activities: studying, sleeping, working, and leisure. Any activity is put into one of the four categories, so that for each student the sum of hours in the four activities must be 168. The regression model is given by
GPA = β0+ β1study + β2Sleep + β3work + β4leisure + u
a) Explain why this model violates the assumption of no perfect collinearity.
b) How could you reformulate the model so that its parameters have a useful interpretation and it satisfies the no perfect collinearity assumption?
Question 2:
Consider the following multiple regression model satisfying the classical linear model assumptions
y= β0+ β1x1 + β2x2 + β3X3 + u
a) Write the t statistic for testing the null hypothesis. H0 = β1 - 3β2 =1
b) Define θ1 = β1 - 3β2. Write a regression equation using β0,θ1, β2, β3 that allows you to directly obtain θ1 ~and its standard error by OLS.
Question 3:
Consider the following model of birth weight, bweight, as a function of the number of prenatal doctor visits, pnvisits,
In(bweight) = β0 + β1pnvisits + β2pnvisits2 +u
Estimation by OLS provides the following estimates, which are all highly statistically significant, β0 = 8.0, β1= 0.02, and β2= -0.0004.
a) What is the effect on birth weight of increasing the number of prenatal visits from 3 to 4?
b) What number of prenatal visits maximizes ln(bweight)?
c) What does the negative coefficient on the quadratic term imply about the relationship between birth weight and prenatal visits? Does the sign make sense? Explain.
d) Suppose we believe the effect of prenatal visits is different for high and low income individuals. Write down a new model incorporating this hypothesis. Clearly define any new variables required.
e) Now suppose that we want to test the hypothesis that both the effect of prenatal visits and the average birth weight differ for high and low income individuals. Write down this model and explain in detail how to test this hypothesis.
Question 4:
This problem refers to Doughtery's Educational Attainment and Earnings Functions (EAEF) data set, accessible through the course website. Use subset 5 for this problem. The following is a summary of the variables needed for this problem. Additional details are provided in Appendix B of the text.
S
|
Years of schooling (highest grade completed as of 2002)
|
ASVABC
|
Composite score on the ASVAB, a standardized test of numerical and verbal ability with a mean of 50 and standard deviation of 10, calculated from ASVAB02, ASVAB03, and ASVAB04
|
ASVAB02
|
Score on the arithmetic reasoning section of the ASVAB test
|
ASVAB03
|
Score on the word knowledge section of the ASVAB test
|
ASVAB04
|
Score on the paragraph comprehension section of the ASVAB test
|
SM
|
Years of schooling of respondent's mother
|
SF
EXP
EARNINGS
|
Years of schooling of respondent's father
Total work experience in years
Current hourly earnings in $
|
Using the data set, regress S on SM, SF, ASVAB02, ASVAB03, and ASVAB04 (3 components of the ASVABC composite score).
a) Report your coefficient estimates and standard errors. (You may simply submit your output for this part.)
b) Compare the coefficients and standard errors from part (a) to those from a regression of S on SM, SF, and SVABC. Explain any differences.
c) Calculate correlation coefficients for the three ASVAB components.
d) What happens if you regress S on SM, SF, ASVAB02, ASVAB03, ASVAB04, and ASVABC and why?
Question 5: This problem uses data on air quality from Verbeek's A Guide to Modern Econometrics. Download the file AIRQ from the course website under Resources. AIRQ contains observations for 30 standard metropolitan statistical areas (SMSAs) for California in 1972. The data set contains he following variables: airq (air quality - lower number is better), vala (value added of companies in $1,000 US), rain (amount of rain in inches), coas (indicator variable equal to 1 if the area is at the coast and 0 otherwise), dens (population density), and medi (average income per person in US$). Note: t- and F- tables are located in Appendix A of the text.
a) Estimate a linear regression model that explains air quality from the other variables using OLS and interpret the coefficient estimates.
b) Test the hypothesis that average income does not affect air quality. Write out the form of the test as well as the value of the statistic and its interpretation.
c) Test the joint hypothesis that none of the variables has an effect on air quality. Write out the form of the test as well as the value of the statistic and its interpretation.
d) Test the hypothesis that value added and average income do not help explain air quality after controlling for population density, rainfall and location in a coastal area. Write out the form of the test as well as the value of the statistic and its interpretation.