1. In data on 1,744 workers, you estimate the following regression, where earnings are weekly, age is in years, and female is a binary indicator:
Dependent variable: ln(earnings)
|
Intercept
|
3.04
(0.18)
|
Age
|
0.147
(0.009)
|
Age squared
|
-0.0016
(0.0001)
|
Female
|
-0.421
(0.033)
|
a. What is the predicted difference between earnings of females & males? Is this significant?
b. How would you test for the significance of age in the regression? Is the quadratic function of age preferred to a linear function of age?
c. Is the effect of age on earnings positive or negative? Does this depend on age? How does this effect change as age increases?
d. What is the predicted effect of age on earnings for a 25-year-old worker? How would you test whether this effect is significant?
e. Why does the intercept have no practical interpretation? How can you transform some of the regressors so that the intercept would have a practical interpretation?
f. How would you expect the estimated female coefficient to be biased if female workers are younger than male workers? What if, instead, female workers were more educated than male workers?
2. In data on 200 4th -6th graders, you estimate the following regression, where weight is in pounds, height is in inches above 4 feet (all kids in the data are at least 4 feet tall), and female is a binary indicator:
Dependent variable: weight
|
Intercept
|
36.27 (5.99)
|
Height (inches above 4 feet)
|
5.32
(0.80)
|
Female
|
17.33 (7.36)
|
Height x Female
|
-1.83 (0.90)
|
a. Interpret the magnitude & significance of the each of the four coefficient estimates, including the intercept.
b. Compare predicted weights for girls & boys who are the same height and are very tall. How does this relate to your interpretation of the coefficient estimate for Female in part a.?
c. How would you test whether girls & boys have significantly different weights? How would you test whether height significantly affects weight?
d. What is the estimated effect of height on weight for girls? How would you test whether this is significant? What are the degrees of freedom for that test statistic? How would you use this test statistic to estimate the standard error for this estimated effect?
3. In data on 1,744 workers, you estimate the following regression, where earnings is weekly, age is in years, and "age < 40" is a binary indicator (= 1 for workers age 39 or younger, = 0 for workers age 40 or older):
Dependent variable: ln(earnings)
|
Intercept
|
6.92
(38.33)
|
Age
|
-0.019
(0.004)
|
Age < 40
|
-3.13 (0.22)
|
Age x
(Age < 40)
|
0.085
(0.005)
|
a. Interpret the estimated coefficients of Age and the interaction term.
b. What is the estimated intercept, and coefficient of Age, of the regression function for workers (i) younger than age 40, and (ii) age 40 & older?
c. Test whether the slopes in part b. are significantly different. How would you test whether the overall regression functions in part b. are significantly different?
d. How would you test for significance of (i) age, holding constant whether or not the worker is at least 40 years old, and (ii) the coefficient of Age in the regression function from part b.-i?
e. What is the predicted difference in earnings between workers age 20 & age 40? Why might this prediction be not entirely accurate?
4. In 1999 data on 30 major league baseball teams, you estimate the following regression, where winning percentage is a measure of team performance, ERA is measure of pitching performance (earned run average), OPS is a measure of hitting performance (on base plus slugging percentage), AL is a binary indicator that the team is in the American League (so equals 0 for National League teams), and the percentage variables are actually in proportion terms (but are always called "percentages"):
Dependent variable: Winning Percentage
|
|
(1)
|
(2)
|
Intercept
|
-0.19 (0.08)
|
-0.29 (0.12)
|
ERA
|
-0.099
(0.008)
|
-0.100
(0.008)
|
OPS
|
1.490
(0.126)
|
1.622
(0.163)
|
AL
|
|
0.10
(0.24)
|
AL x ERA
|
|
0.008
(0.018)
|
AL x OPS
|
|
-0.187
(0.160)
|
a. How can the intercept estimates be negative when winning percentage must be between 0 & 1? What changes could you make to estimate regressions that were otherwise equivalent but had intercept estimates that were feasible values for winning percentage?
b. Based on (1), is it better for a team to have a low or high (i) ERA and (ii) OPS?
c. In model (2), interpret the estimated effect on winning percentage of OPS for AL teams. How would you test whether this effect is significant?
d. In (2), how would you test for the significance of ERA?
e. In (2), are there any significant differences between AL & NL teams? What does this suggest about whether (1) or (2) is preferred? How would you formally test (1) versus (2)? Interpret what this is testing.
f. Standard deviations are 0.53 for ERA & 0.034 for OPS. In (1), how much is winning percentage predicted to change with a 1 standard deviation increase in (i) ERA, and (ii) OPS? Based on this, which seems more important to winning, pitching or hitting?
5. In data on 455 major league baseball pitchers from the 1998 season, you estimate the following regression, where earnings is annual salary, and the regressors represent major league career totals before the 1998 season:
Dependent variable: ln(earnings)
|
Intercept
|
12.15 (0.05)
|
Years
|
0.160
(0.039)
|
Years squared
|
-0.0165
(0.0026)
|
Innings
|
0.00268
(0.00030)
|
Innings squared
|
-0.00000045
(0.00000012)
|
ERA
|
-0.0584
(0.0165)
|
Saves
|
0.0063
(0.0010)
|
a. Interpret the intercept.
b. How would you test whether the overall regression model is linear in regressors, compared to the specification estimated here?
c. What is the return to the first year played & inning pitched? After how many years, and innings, do the respective returns to additional years & innings turn negative? How does this represent different returns to "experience" if a productive pitcher throws about 200 innings per season?
d. How would you test for the significance of the combined return to previous years & innings for a pitcher with 6 previous years & 1,200 previous innings?
e. What is the predicted effect of a (i) reduction in cumulative ERA of 0.50, and (ii) having 50 saves the previous season?
f. Would you make any changes to the specification based on these estimates? What other test statistics would you need to know to be certain of this conclusion?
g. Bonus: how many total zeroes are there in the innings squared coefficient & SE?
6. In what are apparently quite popular data on 1,744 workers, you estimate the following regression, where age is in years and female is a binary indicator:
|
Dependent variable: Weekly earnin
|
gs
|
|
(1)
|
(2)
|
(3)
|
Intercept
|
-344.88 (51.58)
|
-683.21
(120.13)
|
-795.90
(283.11)
|
Female
|
-163.81 (12.47)
|
-163.23 (12.45)
|
-163.19 (12.45)
|
Age
|
41.48 (2.64)
|
65.83 (9.27)
|
82.93
(29.29)
|
Age2
|
-0.45 (0.03)
|
-1.05 (0.22)
|
-1.69 (1.06)
|
Age3
|
|
0.005
(0.002)
|
0.015
(0.016)
|
Age4
|
|
|
-0.0005
(0.0009)
|
a. Interpret the intercept. How can you transform the regressions to get a meaningful intercept but the same coefficients on the regressors?
b. Interpret the estimated gender earnings differential. Does this appear to suffer from omitted variable bias when higher order powers of age are not included as regressors?
c. Based on (1), what is the main nonlinearity in the relationship between earnings & age?
d. In (2), what is the predicted effect of age for 60-year-olds? How would you test whether this is significantly different from zero? How would you use that test statistic to calculate the standard error of the predicted effect?
e. Which of these three models do you prefer? In your preferred model, how would you test that age is a significant determinant of earnings? What degrees of freedom does your test statistic have?
f. Which regressors appear to be highly correlated with each other, and how can you tell?
7. In data on 253 workers, you estimate the following regression, where wage is hourly, education & experience are in years, and female & married are binary indicators:
Dependent variable: ln(wage)
|
Intercept
|
0.14
(0.16)
|
Education
|
0.093
(0.011)
|
Experience
|
0.032
(0.006)
|
Experience squared
|
-0.0005
(0.0001)
|
Female
|
-0.158
(0.075)
|
Married
|
0.173
(0.080)
|
Female x Married
|
-0.218
(0.097)
|
a. Interpret the estimated coefficient of (i) education & (ii) the interaction term, including the significance of each.
b. How would you test whether experience is a significant determinant of wages? What is the return to experience? How does this change as workers gain experience? What is this return for workers with 10 years of experience, and how would you test whether this is significant?
c. How would you test whether predicted wages differ significantly by (i) gender & (ii) marital status?
d. What is the return to marriage for females? How would you test whether this is significant?