Problem Set: The Impact of Education on Smoking
This problem set uses the dataset smoking.dta.
A group of researchers is working for an organization that aims to reduce smoking among the population. They hypothesize that increasing education is associated with a decline in smoking. The underlying theory is that as people gain more education, they are better able to acquire and process information on the dangers of smoking.
The dependent variable is cigs, defined as the number of cigarettes an individual smokes per day. The independent variable of interest is education.
Table 1: Variable Definitions
Variable
|
Description
|
educ
|
years of schooling
|
cigpric
|
state cigarette price, cents per pack
|
white
|
=1 if white
|
age
|
age in years
|
income
|
annual income, $
|
cigs
|
cigarettes smoked per day
|
restaurn
|
=1 if state restaurant smoking restrictions
|
Part 1: Descriptive Statistics
Post all answers below each question-do not delete the questions!
1. Complete the following table of descriptive statistics. Note: You do not need to run a regression for this question.
Table 1: Descriptive Statistics
|
Minimum |
Maximum |
Mean |
Standard Deviation |
Number of Observations |
cigs |
|
|
|
|
|
educ |
|
|
|
|
|
What is the mean number of years of education that individuals participating in this study reported?
What is the average number of cigarettes participants smoked per day?
Were there people in the study that did not smoke at all?
What was the highest number of cigarettes smoked per day?
Part 2: Regression Analysis
2. Using Stata, run a regression of cigs on educwith robust standard errors. Copy and paste the Stata output into your write-up. *Note: copy and paste as a picture or alternatively using font Consolas size 10 or 11. Make sure what you post is legible to your reader.
3. Report the sample regression function (with robust standard errors in parentheses beneath the coefficients).
4. Interpret the coefficient on education(β ^Educ).
5. Is β ^Educ statistically significant? Explain how you arrived at your conclusion.
6. Report the 95% confidence interval for β ^Educ.
7. What is the estimated effect on the number of cigarettes smoked per day for a person with 16 years of education?
8. What is the estimated effect on the number of cigarettes smoked per day for a person with 10 years of education?
9. Based on these findings, one of the researchers want to publish the results that increased education is needed to reduce smoking. Why might you advise against this?
10. Using the dataset, smoking.dta, create a different bivariate regression model that aims to explain the causes behind smoking. For your model provide the following:
- The alternative and null hypothesis.
- Interpret the coefficient on the independent variable, interpret the direction, magnitude, and significance of the coefficient.
- Explain if the model allows you to satisfactorily conclude that the independent variable chosen explains the number of cigarettes smoked per day.