Questions -
Part A - Try to complete this part A reviewing the ACS data files
The six households below contain one mom, one dad, and either one or two children. The table below indicates the Census-assigned household number (for reference purposes; it is not used in this assignment), the number of children in the household, a two-child household indicator variable, and the number of hours the mom works in a typical week.
household
|
Number kids
|
Two kids
|
Mom's work hours
|
16761
|
2
|
1
|
0
|
24438
|
2
|
1
|
45
|
41729
|
2
|
1
|
0
|
42802
|
1
|
0
|
35
|
63528
|
1
|
0
|
40
|
148848
|
1
|
0
|
0
|
Here is your task: Carry out a difference in means test of the null hypothesis that moms with one child work the same number of hours as moms with two kids; in other words, the null hypothesis is that average work hours is the same in both types of households. To do this, you'll have to answer the following questions:
1. What is the average female work hours among one-child households? What is the average female work hours among two-child households? What is the difference in means?
2. What is the estimated standard error of the difference in means? (Hint: you have to calculate the variance of work hours for each group and plug these values, along with the number of observations in each group, in the formula in MM Chapter 1, footnote 17. You then have to take the square root of a large number; if you are doing this by hand, an approximate value is fine, and you can use the following hint to help you: the square root of 361 is 19 and the square root of 400 is 20. )
3. What is the value of the test statistic?
4. Is the difference in means statistically different from zero? In other words, do you reject the null hypothesis of equal work hours? (Hint: if the absolute value of the test statistic is greater than 1.96 we reject the null and say the differenc ein means is statistically different from zero, or "statistically significant". Angrist and Pischke describe the critical value as "about two" but the number 1.96 is the precise value at which the cumulative standard normal distribution (this is a "bell shaped curve") has 2.5% of the area under the curve and to the right. Thus, 95% of the area is under the curve between -1.96 and 1.96, and so 1.96 is known as the critical value for a test at the 5% significance level.)
5. (This last question is optional; I don't want to distract you from the main task in this assignment, which is the mechanics of difference in means testing, and it is a pretty open ended and conceptual-level question. But think about it and give it a try as even though its not for credit, answering it can't hurt you.) This question asks about the causal effect of children on female labor supply. The difference in means test you carried out above related to only a tiny subsample of the full ACS sample for PUMA 06085011; might the difference in means you found in question 1 overstate, understate, or accurately reflect the average causal effect of having an additional child on female labor supply? Why? (Hints: Are there any selection effects present that could mean the difference in means is a biased estimate of the average causal effect of an additional child on labor supply? What kind of women select to have more children? For example, consider comparing a woman who decided to have exactly two children, versus a woman who decided to have one but unexpectedly had twins. Say something intelligent about correlation versus causation.)
Part B -
1) Select an outcome variable and report the variable name and description from the dictionary (e.g. in last week's assignment this was, "WKHP, Usual hours worked per week past 12 months.").
2) Describe a way of separating the sample into two groups (in last week's question, this was women who have either 1 or 2 children).
3) Finally, indicate the unit of observation (e.g. last week it was a married female with either one or two children that was surveyed in the ACS data and who lives in Public-Use MIcrodata Area 0608511, an area in Santa Clara County. For other studies with different variables, you might look at all workers, all male workers, retired people, etc.)
To complete this assignment, you don't necessarily have to select particularly interesting outcome variables. Likewise, your method of separating the subsample into two groups doesn't have to be in a way that measures something that affects the outcomes. However, keep in mind that your outline will be due before you know it, and the term paper will be due eventually, and so you probably should use this as an opportunity to find data for your paper. At a minimum, you should definitely complete this assignment to earn the weekly assignment point. But ideally, you will take this opportunity to find some "interesting" variables. Interesting in the sense that they relate to important economic effects. For your term paper, if your analysis is not interesting, it will be hard, maybe impossible to find any previously published articles that relate to it (because if it is not interesting, no one else will have bothered to publish anything on it.)
So for part 1) Ideally we're looking for an economically interesting variable; this would be a measure of something that is discussed in scholarly economics journals, and for part 2.) We're looking for something that could affect the outcome in an economically meaningful way (so for example, having two children might make it likely for a woman to work fewer hours outside the home.). Of course, your outcome variable could depend on many other factors besides the group selector variable, and it could also depend on the way you select the units of observation, i.e. do you look at all married women with children, only married white women with children, only married white women with a graduate degree and children?)
Following up on part B
1. Carry out a difference in means test. Specifically, test the null hypothesis that the average value of your outcome variable is equal across the two groups (in other words, that the difference in means is zero.)
2. To do this, you'll need to calculate 1) the average value for each group and the difference in means 2) the appropriate standard error, which requires calculating the variance of each group, as well as the number of observations in each group) and 3) the test statistic.
3. Do you reject the null hypothesis? Present the data you use for easy reference, and describe the steps you took in calculating the test statistic.
Attachment:- Assignment Files.rar