Economics 706 - Problem Set 2
Problem 1 - Suppose you have the following data
Xi
|
Ti
|
N
|
Y-
|
0
|
0
|
100
|
1.5
|
0
|
1
|
75
|
1.8
|
1
|
0
|
130
|
1.0
|
1
|
1
|
100
|
1.2
|
2
|
0
|
45
|
2.3
|
2
|
1
|
120
|
2.7
|
so Xi is the value of the X variable, Ti is the treatment status, N is the number of observations of each type, and Y¯ is the average value of the outcome for each cell. For example there are 100 observations with Xi = 0 and Ti = 0, and the mean value of Yi for those observations is 1.5. Calculate an estimate of the treatment on the treated and also the average treatment effect under the assumption of selection only on observables (Assumption 1 in the lecture notes). You do not need to do matching to answer this question, it can be gotten more directly using the law of iterated expectations.
Problem 2 - Now suppose, have the following raw data:
X1i
|
X2i
|
Ti
|
Yi
|
0
|
1
|
1
|
2
|
0
|
1
|
1
|
1
|
1
|
0
|
0
|
3
|
2
|
1
|
0
|
1
|
2
|
0
|
1
|
2
|
2
|
0
|
0
|
0
|
2
|
1
|
0
|
1
|
1
|
1
|
0
|
2
|
1
|
0
|
1
|
1
|
1
|
1
|
1
|
0
|
0
|
1
|
0
|
3
|
2
|
0
|
0
|
2
|
1
|
0
|
1
|
1
|
0
|
1
|
0
|
1
|
2
|
0
|
1
|
0
|
Use matching to construct an estimate of the treatment on the treated. First you must choose some rule to decide what to do if there is more than one available match. Explain what that is as part of your answer.
Problem 3 - Lets now think about the propensity score. What we showed in class is that the distribution of Xi is identical for the treatments and non-treatments when we control for the propensity score. Let's show that. Let's suppose that this is the distribution of (X1i, X2i, Ti) in the population
X1i
|
X2i
|
Ti
|
Prob.
|
0
|
0
|
0
|
0.10
|
0
|
0
|
1
|
0.10
|
0
|
1
|
0
|
0.10
|
0
|
1
|
1
|
0.20
|
1
|
0
|
0
|
0.05
|
1
|
0
|
1
|
0.10
|
1
|
1
|
0
|
0.10
|
1
|
1
|
1
|
0.25
|
That is Prob. is the probability of that particular outcome. Notice the probabilities add up to one.
1. Calculate the propensity score for each of the four different combinations of X1i and X2i.
2. Calculate
(a) The distribution of (X1i, X2i) conditional on the propensity score is 2/3 and Ti = 1
(b) The distribution of (X1i, X2i) conditional on the propensity score is 2/3 and Ti = 0
(c) The distribution of (X1i, X2i) conditional on the propensity score is 2/3 and not conditioning on Ti
Problem 4 - Here I want you to try propensity score estimation on actual data. You don't have to use stata, but if you don't you'll either have to find a canned propensity score module for the program you are using or program it up yourself. I am going to write this assuming you are using stata.
1. Go to the site https://ideas.repec.org/s/boc/bocins.html and download data set bwght2.dta into stata. We want to study the effects of expectant mothers smoking while pregnant on the (log) birthweight of their kid.
2. Construct the treatment variable which will be one if the mother ever smoked (gen smoke=cigs>0)
3. What is the difference in log birthweight for mothers who smoked versus those that didn't?
4. Run a regression of log birthweight on smoke conditioning on a number of the background variables in the data set (reg lbwght smoke mage meduc monpre npvis fage feduc fblck magesq npvissq mblck, robust)
5. Next we will try propensity score matching, if you are using stata 13 try: help teffects psmatch
6. Try propensity score matching for the average treatment effect: teffects psmatch (lbwght) (smoke mage meduc monpre npvis fage feduc fblck magesq npvissq mblck)
7. Now try the treatment on the treated (add , atet to the end of the last exercise)
8. Next let's look at how the propensity score varies by people who smoke and didn't.
(a) Estimate a logit (logit smoke mage meduc monpre npvis fage feduc fblck magesq npvissq mblck)
(b) Form the propensity score (predict phat)
(c) Graph the distributions (twoway (histogram phat if smoke==1) (histogram phat if smoke==0, fcolor(none) lcolor(red)))
Problem 5 - Now pick two other data sets and try propensity score estimation. You can use any of the data sets on that web site or anything else you want. What did you learn?