1.With this assignment you will find an Excel file called HW4.Vacation.xlsx. For reference, the variables in this file include:
miles = miles traveled per year
age = average age of adult members of household
kids = number of children in household
a. Run the following regression: Miles = β0+β1*age
b. Hypothesize the sign of the bias, if any, resulting from excluding kids from the regression. Explain your reasoning.
c. Use STATA to verify (or not) your claim from b). Explain using your STATA output to prove this.
d. Break down the bias into the component pieces (as we did in class) using actual numbers. [Hint you should be running the “shadow regression” to determine the value of α]. Highlight any commands that you use in STATA.
In class, we discussed dummy variables and interaction terms using the regression:
wage = β0 + δ0female + β1educ + δ1female*educ + u
In the graphs below, sketch the scenarios when (a) δ0>0, δ1>0 and (b) δ0>0, δ1<0. Be sure to label the “male” and “female” line in each graph:
(a) δ0>0, δ1>0 (b) δ0>0, δ1<0
3. (9 points total. 3 parts worth 3 points each.) This part of the homework will use one of the datasets posted on ANGEL. With this assignment you will find an Excel file called SAT.xlsx. We are looking at determinants of high school SAT scores. For reference, the variables in this file include:
ap = a dummy variable equal to 1 if the ith student has taken AP math and/or AP English, 0 if the
ith student has taken neither
gpa = the weighted GPA of the ith student
prep = yes/no based on whether the ith student has attended an SAT preparation course
SAT = score received on SAT by the ith student
The first thing to do is to put the variable “prep” in the form of a dummy variable that we can use. There are several ways to do this, but here is one way:
tab prep, gen(prep)
generate preptaken=prep2==1
This creates a dummy variable for prep where “preptaken” is equal to 1 if the student took the prep course (yes) and equal to 0 if the student did not take the prep course (no).
a. Run the following regression: SAT = β0 + β1*ap + β2 *gpa + β3*preptaken. Attach your STATA output and report the values of the coefficients and whether or not they are significant at the .05 level in the table:
Variable
|
Coefficient
|
Significant? (yes/no)
|
AP
|
|
|
GPA
|
|
|
Prep_taken
|
|
|
b. Someone suggests that the prep course may only be effective for certain kinds of students since it might be like trying to “cram” for an exam. Create an interaction term for students who took AP courses AND the prep course. (You can check your coursepack for how to do this)
Using the results from your output (attach these also), write out the equation for the new regression equation:
?^??? =
c. Which variables are significant at the .05 level now? Write them here and circle in your output.
Attachment:- HW4.Vacation.xlsx
Attachment:- SAT.xlsx