Econometrics 718 - Optional Problem Set
Problem 1- Lets think about the propensity score. What we showed in class is that the distribution of Xi is identical for the treatments and non-treatments when we control for the propensity score. Let's show that. Let's suppose that this is the distribution of (X1i, X2i, Ti) in the population
X1i
|
X2i
|
Ti
|
Prob.
|
0
|
0
|
0
|
0.10
|
0
|
0
|
1
|
0.10
|
0
|
1
|
0
|
0.10
|
0
|
1
|
1
|
0.20
|
1
|
0
|
0
|
0.05
|
1
|
0
|
1
|
0.10
|
1
|
1
|
0
|
0.10
|
1
|
1
|
1
|
0.25
|
That is Prob. is the probability of that particular outcome. Notice the probabilities add up to one.
1. Calculate the propensity score for each of the four different combinations of X1i and X2i
2. Calculate
(a) The distribution of (X1i, X2i) conditional on the propensity score is 2/3 and Ti = 1
(b) The distribution of (X1i, X2i) conditional on the propensity score is 2/3 and Ti = 0
(c) The distribution of (X1i, X2i) conditional on the propensity score is 2/3 and not conditioning on Ti
Problem 2 - Use the stata data jtrain2.dta that you can get from https://www.stata.com/texts/eacsap/. The treatment here is "train" and we will estimate the effect of the treatment on the treated in a number of different ways. The dependent variable will be re78 (real earnings in 1978). Let's first only control for black and hispanic, so there are three categories black, hispanic, and non-black non-hispanic.
a) Tabulate the 6 cell means by race and by training status. That is the mean for black trainees, etc. Also what are the proportions of the three groups by training status?
b) Use the 12 numbers from part a) to calculate an estimate of the treatment on the treated.
c) Use regression analysis with dummy variables to estimate the treatment on the treated.
d) Now use propensity score matching. Stata 13 does this with the package teffects psmatch.
e) Estimate the TT parameter using reweighting.
Problem 3 - Now pick some other data set and choose a treatment variable. Continue to estimate the treatment on the treated.
a) Estimate using linear regression.
b) Try doing by propensity score matching in 4 different ways (any variations you choose).
c) Estimate by reweighting.