Regression Assignment
Instructions
Background
A problem of interest to health officials (and others) is to determine the effects of smoking during pregnancy on infant health. One measure of infant health is birth weight; a birth weight that is too low can put an infant at risk for contracting various illnesses. Since factors other than cigarette smoking that affect birth weight are likely to be correlated with smoking, we should take those factors into account.
1. Open the Excel file from Moodle. You should see a worksheet that prompts you for your name Fill these items in before proceeding. On the second worksheet named Raw Data you will find the data for the assignment. Here are the variable names and their descriptions:
Variable Name Description
ssn mother's social security number
bweight birth weight in ounces
cigs cigarettes smoked per day while pregnant
faminc family income in $1000s
parity birth order of child motheduc mother's years of education gender gender of child
white ="White" if child is white
married ="Yes" if mother is married
nutrition ="Yes" if mother took nutrition class
moth_hgt mother's height in centimeters
gest_age gestational age in months
2. The data you see on the Raw Data worksheet is locked and can't be modified. Copy all of the data and paste it into the worksheet named Modified Data. If you mess up your data at some point, you can retrieve it from the Raw Data worksheet.
3. In the Modified Data worksheet create four dummy variables.
(a) Create a dummy variable based on the variable gender. The dummy should equal 1 for male children and 0 otherwise. Name this variable gender dum.
(b) Create a dummy variable based on the variable white. The dummy should equal 1 for white children, 0 otherwise. Name this variable white dum.
(c) Create a dummy variable based on the variable nutrition. The dummy should equal 1 for mother's who took a nutrition class, 0 otherwise. Name this variable nut dum.
(d) Create a dummy variable based on the variable married. The dummy should equal 1 for mother's who are married, 0 otherwise. Name this variable married dum.
4. The values for family income are missing from the main data set but can be found on the Family Income worksheet. Match the family income data to the other data using the social security numbers on both worksheets. There are almost 1,400 observations, so you obviously can't match them one-by-one. But you can do the matching easily in Excel using techniques we learned in the computer lab.
5. Perform a simple regression using the following model:
bweight = α + β · cigs + ε
Name the worksheet with the regression output regression 1. Expand the columns as needed to make the results look nice.
6. Fill in the values for the estimated coefficients and other statistics on the Answers worksheet. You will need to copy and paste your results from regression 1 into the appropriate cells. Do not round your answers.
7. Fill in the answers to the following questions on the Answers worksheet.
(a) What is the meaning of the slope coefficient and the intercept?
(b) Explain the estimated effect of cigarette smoking on birth weight.
(c) Do the coefficients have the signs you would expect?
(d) Are the coefficients statistically significant at the 95% confidence level?
(e) What does the R2 value tell you?
(f) Predict the weight of a newborn whose mother smoked 20 cigarettes per day.
Be sure to put your explanations of slope coefficients in terms of the original units of measure.
8. Now examine the relationship between cigarette smoking and birth weight visually. Create a new worksheet tab named chart. On the new tab, create a scatter plot with trend line showing the linear relationship. The birthweight variable should be on the y-axis and the number of cigarettes smoked should be on the x-axis. Make the chart look pretty by removing the gridlines and labeling each axis.
9. Now perform a multiple regression, using the explanatory variables
• cigs
• faminc
• parity
• motheduc
• gender dum
• white dum
• married dum
• nut dum
• moth hgt
• gest age
Expand the columns to make the results look nice. Name the worksheet with the new regres- sion output regression 2. On the Answers worksheet, fill in the results from the regression 2 worksheet and respond to the questions below as in item 7 above.
(a) What is the meaning of the slope coefficient for the family income variable?
(b) Explain the estimated effect of taking a nutrition class on birth weight.
(c) Do the coefficients have the signs you would expect?
(d) Are the coefficients statistically significant at the 95% confidence level?
(e) What does the R-squared value tell you?
(f) Compare your results from this regression to the previous one. Has the coefficient for cigs changed? If so, explain why.
(g) What can you say about the goodness-of-fit for regressions 1 and 2?
(h) Does the second regression model violate the basic assumption that the explanatory variables must be uncorrelated with the error term? Explain. If the assumption was violated, provide a potential solution.
Attachment:- Excel_Assignment.rar