Macroeconomics Shocks and Child Survival Sonia Bhalotra
What you will learn: This project will introduce you to the use of survey data, help you understand panel data (in action) and encourage you to use your knowledge of economics to interpret the results of data analysis.
Reading: You will start by reading three papers concerned with identifying the causal impact of macroeconomic shocks on child health and survival and, in one case, cognitive development too.
(1) Dehejia, R. and A. Lleras-Muney (2004). Booms, busts, and babies' health, Quarterly Journal of Economics, Vol. 119(3), pp. 1091-1130.
This paper uses state level US panel data to estimate the impact of state-level unemployment variation on state-level mortality rates amongst newborns. It uses micro-data for other health outcomes.
(2) Bhalotra, S. (2010). Fatal fluctuations: cyclicality in infant mortality in India. Journal of Development Economics. 93(1), September, pp. 7-19.
This paper uses a mother-level panel nested within a state-level panel for India to investigate the impact of state-level income variation on individual mortality rates amongst newborns.
This is the data set you have. I have given you a random sample of the data I used for my paper because the original is too large. As a result, you will encounter some oddities in the data e.g. within a mother you will find that children of certain birth orders are (randomly) missing.
(3) Hidrobo, M. (2011). The effect of Ecuador's 1998-2000 economic crisis on child health and cognitive development. Mimeograph (uploaded on blackboard).
This paper uses provincial variation in inflation to estimate the impact of a recent economic crisis in Ecuador on child height and vocabulary test scores.
Unlike the previous two papers it is focused upon a particular dated event (the crisis) rather than regular cyclical economic variation. Also, the other papers exploit variation in economic conditions across state and birth-year while this paper exploits variation in exposure to the crisis by birth month and province.
Data: I have uploaded the following stata files for you to merge to create the data set used in my paper.
(a) File micro-India.dta contains the microdata that include the main dependent variable, mortality, labelled infant. The mother identifier is "seqid". Children within mother are identified by their birth order ("bord").
(b) File panel-India.dta contains a state level panel data set that includes state level per capita (real) income, which is called income in the file.
The microdata also include variables describing education and demographics for mothers, their children and their partners (the mother is the respondent). The macro-panel also contains time series for each state on rainfall which, in poor countries, is a predictor of mortality.
Panel data structure: Panel (or longitudinal) data consist of repeated observations within a cross-sectional unit. Panel data may be generated by tracking countries (or states), firms or households (or mothers) over time.
The file you are going to work with contains microdata structured as a panel of siblings within mother. So think of mothers as the cross-sectional units- there are lots of mothers (n) observed at any time. Each mother has births sequenced over time (t). Every birth occurs in a state and a year and so macroeconomic events in the state and year of birth can be modelled as influencing the health and survival of the birth.
The microdata were downloaded from www.measuredhs.com. This website contains further information about the data, questionairres, reports, etc. I selected the India survey conducted in 1998/9 from this website, cleaned the data, selected variables, selected a sample and transposed the data so that every row is now a child rather than a mother. This is what we need because the dependent variable in the analysis (infant mortality or other measures of child health) is defined at the child level.
Project
Your project is a mini-thesis. Write it up as a document that includes an introduction that motivates your research, a theoretical framework, an empirical strategy, a description of the data, a discussion of results, Tables and graphs. What follows is a set of guidelines; you should feel free to innovate around these suggestions.
I. Data Preparation
1. Study the mortality rate and try to identify outliers. Is there any pattern to the outliers?
2. Study the entire data file and consider missing values. Is there any pattern to these?
3. Look at the summary statistics of all variables in the file so as to develop a sense of health and economic conditions in India over the period spanned by the data.
Note: I have transformed the data already: taken the logarithm of income, created rainfall shocks, created dummies for continuous variables like years of education and age at birth. We are using individual data on mortality which is a dummy (0/1). So we clearly would not log this variable.
II. Data Description
Conduct the following exercises in Stata to gain some familiarity with the data. Comment on the statistics and graphs that you record.
1. What fraction of people in this sample are Christian, Muslim and Hindu?
2. What percentage are rural?
3. What is the average mortality rate in the sample? What is the average rate by decade?
4. Plot income trends by state
5. Plot trends in the infant mortality by state
6. Plot the fitted relationship between infant mortality and income by state.
Hint: For the fitted relationship show the scatter and a linear fit. Also try using the lowess command in stata to get a nonparametric plot.
7. Collapse the relevant data to get a time series at the all-India level. Plot de-trended mortality rates against de-trended income. Explain why de-trending may be helpful. Note that "collapse" is a command in Stata- type "help collapse" for syntax guidance.
Hint: When you collapse a file to get means of the data across N or T, you are of course changing the data. This is fine because you can revert to your saved file on disk. Alternatively, you can use the stata commands "preserve" and "restore" before and after any data transformation- look them up using help within Stata.
Hint: To detrend yt, run the regression yt=a + b(trend), where trend is a linear trend with values 1,2,3 and save the estimated residuals (in Stata: "predict yresid, resid"). I have generated the variable "trend" for you. Note though that when micro and macro data are in the same file you have to be careful about how you generate a trend. You can summarise year, note the minimum year (1970 in this file). Then one way to generate a trend is to use the command: gen trend=year- 1969. You can see that this will give trend=1 for year=1970, trend=2 for year=1971 etc.
8. Collapse the relevant data to get cross-sectional data by state. Plot the between-state relationship of mortality and income. Explain what one can and cannot learn from this plot.
III. Probit vs the Linear Probability Model
1. Run a simple probit of mortality on income. Report and discuss the marginal effects (ME).
2. Run a simple linear probability model, regressing mortality on income using OLS.
3. Compare the OLS estimates with the ME from the probit.
4. Check what fraction of OLS predictions lie outside of the [0,1] range in which we expect probabilities to lie. What does this tell you?
IV. Panel Data Regressions
Part A: Try to replicate sections of the estimates shown in Bhalotra (2010).
Below I suggest steps you can take in Stata. You need to write a coherent text that (a) motivates the changes you make to the specification and (b) discusses how they change the estimates and why. Naturally, I don't want you to simply reproduce the discussion that is in the paper but to put your own mark on it.
Read the text of the paper where this equation is put down as a baseline, after which various extensions are considered.
(1) Mimst=? yst + ?m + ?t + ?s tst + Zimst? + ?imst
• M is a dummy that indicates whether index child i of mother m born in year t in state s died by the age of 12 months
• y is the logarithm of per capita net domestic product in state s and year t deflated by the consumer price index for agricultural workers (henceforth income).
? is the parameter of interest and it measures the change in infant mortality associated with a 100% change in income.
• ?m denotes mother fixed effects. Since, by construction of the sample, mothers do not migrate between states, the mother fixed effect incorporates a state fixed effect.
• Heterogeneity in death risk within mother is allowed for by including the child-specific Zimst, specified as dummies for gender, birth-order, birth-month and age of mother at the birth of the index child.
• ?t are year (or cohort) dummies that control for aggregate shocks
• ?stst capture omitted trends that vary by state.
You do not have to follow every detail in the paper. Use it as a guide and estimate a sequence of models that represent a simplified version of results in Table 1 of the paper, as follows.
1. Estimate the correlation of infant mortality risk and state income.
2. Estimate the within-state relationship (use state fixed effects).
3. Estimate the within-state relationship of the de-trended variables.
Hint: you can de-trend by including year dummies. Look up the xi command in Stata which is useful for including a set of dummies.
4. Correct the standard errors for heteroskedasticity and for clustering within state. Hint : look up the commands robust and cluster.
5. Estimate the within-mother relationship (use mother fixed effects).
Hint:
When you are using state fixed effects the cross-sectional unit is the state and when you are using mother fixed effects it is the mother so-
xtset, clear /*clear the xt-settings that told Stata that the cross-sectional unit was the state*/
xtset seqid /*tell Stata that the cross-sectional unit is now the mother */
6. You can add the controls Z at any stage in the above sequence. I'll let you select what controls you want to use and in what form. All variables in the data file are labelled and the two published papers provide guidance. Discuss the motivation for the controls you add, and the results of including them.
7. We may expect that the health of children responds to income shocks differently depending upon their families charcteristics. You might like to explore, for example, differences in the impact of income on mortality for mothers with high and low education. (refer my paper which looks at heterogeneity by a number of characteristics including education).
IV.B. Paper-1 uses state-level panel data (for the US). Convert your microdata for India to a state level panel (note - you did this for the graphs above). Re-estimate the main specification. Do the results change? Comment.
V. Analysis
Having obtained these estimates, answer the following questions. This should form a substantial fraction of your thesis.
1. Explain what the aim of the first two papers (D-LM and Bhalotra) is.
(a) Outline the essential difference in the strategies used in the two papers for estimation of the equation for (neonatal or infant) mortality.
(b) Outline the main differences in the estimates.
(c) Explain why the results are so different.
2. What was the advantage of having a panel of births within mother? In particular, explain why a selection bias may arise, what direction it may take, and how it is avoided.
3. What was the advantage of having panel data at the state level, rather than time series data at the all-India level? In particular, explain what fixed and time-varying unobservables might "confound" the relationship of interest and how exactly these problems were overcome in the estimation?
4. I now want you to focus in on paper-3.
(a) Explain what the hypothesis is that this paper sets out to test.
(b) What is the essence of the identification strategy? This is important. Explain first what the identification problem is and then what the author does to address it.
(c) Can you think of improvements to identification or extensions of the econometric approach? If you were a referee, what would your assessment of the paper be and what constructive suggestions would you make?
Attachment:- Bhalotra JDE(1).pdf