Assignment:
Note: For the R exercises that involve regression, be sure to use heteroskedasticity-robust standard errors. Here's the short version:
# Need libraries sandwich and lmtest
library(sandwich) library(lmtest)
# Example
cps<-read.csv(file="cps8.csv",header=TRUE)
# Fit
fit<-lm(ahe~yrseduc,data=cps)
# Summary provides non-robust SE (along with t stat and p-value)
summary(fit)
# Replace non-robust SE with robust SE (i.e. use these results instead of the above).
coeftest(fit,vcovHC(fit,type="HC1"))
1. Optimal estimator of a population mean Consider a simple setup where we take a random sample of size 2. Both selections, Y1 and Y2, come from a population where the mean is µ (i.e. E.Y1. = µ and E.Y2. = µ).
(a) Find the BLUE estimator of µ in the case where it is known that Y1 and Y2 have the same variance (are homoskedastic), say σ2Y . Show that it is, in fact, BLUE. Report the variance of this estimator.
(b) Now suppose it is known that their variances are not the same (heteroskedastic); in fact, it's known that the variance of Y1 is 3 times higher than the variance of Y2. Is the estimator you found in (a) still BLUE? Why or why not? If the estimator from (a) is not BLUE, construct a BLUE estimator (showing that it is in fact BLUE) for this case of unequal variance and report its variance.
2. Earnings and height II
Use the Earnings_and_Height.csv data from assignment 3 to carry out the following exercises.
(a) Run a regression of earnings on height.
i. Is the estimated slope statistically signiftcant?
ii. Construct a 95% conftdence interval for the slope coefftcient. (b)Repeat (a) for women.
(c) Repeat (a) for men.
(d) Test the null hypothesis that the effect of height on earnings is the same for men and women. To do this you'll need the standard error for the difference, like what we had for sample means in Chapter 3. The standard error for the difference in slopes is
SE.(βˆα,1 + βˆb,1) = √[SE(β^α,1)]2 + [SE(β^b,1)]2]
where a and b are the two different groups (e.g. men and women).
(e) One explanation for the effect of height on earnings is that some professions require strength, which is correlated with height. Does the effect of height on earnings disappear when the sample is restricted to occupations in which strength is unlikely to be important?
3. Smoking and birthweight II
The dataset Birthweight_Smoking.csv (Canvas) contains data for a random sample of babies born in Pennsylvania in 1989. The data include the baby's birth weight together with various characteristics of the mother, including whether she smoked during the pregnancy.1 A detailed description is given in Birthweight_Smoking_Description.pdf (Canvas). In this exercise you will investigate the relationship between birth weight and smoking during pregnancy.
(a) In the sample:
i. What is the average value of birthweight for all mothers?
ii.For mothers who smoke?
iii.For mothers who do no smoke?
(b)
i. Use the data in the sample to estimate the difference in average birth weight for smoking and nonsmoking mothers.
ii. What is the standard error for the estimated difference in (i)?
iii. Construct a 95% conftdence interval for the difference in the average birth weight for smoking and nonsmoking mothers.
(c) Run a regression of birthweight on the binary/Bernoulli variable smoker.
i. Explain how the estimated slope and intercept are related to your answers in (a) and (b).
ii. Explain how the SE(βˆ1) is related to your answer in b(ii).
iii. Construct a 95% conftdence interval for the effect of smoking on birth weight.
(d) Do you think smoking is uncorrelated with other factors that cause low birth weight? That is, do you think that the regression error term, say ui , has a conditional mean of zero, given smoking (Xi )? (This is a crucial issue that we will investigate further in assignment 5).
Documentation for Earnings_and_Height
These data are taken from the US National Health Interview Survey for 1994. They are a subset of the data used in Anne Case and Christina Paxson's paper "Stature and Status:
Height, Ability, and Labor Market Outcomes," Journal of Political Economy, 2008, 116(3): 499-532, and were graciously supplied by the authors for empirical exercises in the Stock-Watson textbook.
The dataset contains information on 17,870 workers. The table on the next page describes the variables.
Attachment:- Data.rar