Project
Test out if the alcohol consumption of a worker determines his/her income. Include up to 5 other regressors that might be relevant.
Within the actual paper, you'll want to present your estimated model and also descriptive statistics (mean, standard deviation, maximum, minimum) for all of the variables used in this model. Note that you want to look at these statistics to see if they make sense! When working on your project, you will find it useful to use a different EViews workfile.
It is important to note that not all of the variables are in the most appropriate form for regression analysis. You will definitely have to create some new variables! Additionally, not all of the available variables will make sense in the model. Be especially careful of variables that don't make sense together, but may alone. You will have to make decisions about what to include, keeping in mind the ceteris paribus interpretation of multiple regression analysis.
The Data
The data set alcohol.wf1 comes from the National Longitudinal Survey of Youth (NLSY) and includes information on labor market outcomes, alcohol consumption, and assorted demographics for individuals in 1989. The data are restricted to young adults who are between the ages of 24 and 32 in 1989. Each individual has a unique identifier (variable named id) and the year is indicated by the variable named year.
The following labor market variables are available:
wgsal - total wage and salary income in the past calendar year, in dollars
hrswrk - total number of hours worked in the past calendar year
wkswrk - total number of weeks worked in the past calendar year
wksue - total number of weeks spent unemployed in the past calendar year
wksolf - total number of weeks spent out of the labor force in the past calendar year
empst - a categorical variable indicating the individual's current employment status. It is defined as follows:
1 = Employed
2 = Unemployed
3 = Out Of Labor Force
4 = In Active Armed Forces
numjob - total number of jobs the individual has ever held in their lifetime
The following alcohol consumption variables are available:
drinkev - a dummy variable = 1 if the individual has ever had a drink, = 0 otherwise
drnkmo - a dummy variable = 1 if the individual has had a drink in the last month, 0 otherwise
drnk6m - a categorical variable indicating the number of times in the past month the individual has had 6 or more drinks in one sitting. It is defined as follows:
0 = Never
1 = Once
2 = 2 Or 3 Times
3 = 4 Or 5 Times
4 = 6 Or 7 Times
5 = 8 Or 9 Times
6 = 10 Or More Times
days - the number of days in the last month the individual has had at least 1 drink
perday - the average number of drinks per day on a day when the individual drinks (this is 0 if the individual doesn't drink)
gtint - a categorical variable that answers the question of whether the individual has ever drunk more than intended. It is defined as follows:
0 = Don't Drink
1 = Happened 3+ Times In Past Year
2 = Happened 2 Times In Past Year
3 = Happened 1 Time In Past Year
4 = Happened In Lifetime Other Than Past Year
5 = Never Happened
The following demographic variables (which may or may not be useful for the analysis) are available:
age - age of the individual in years
sex - a categorical variable = 1 if the individual is a man and =2 if a woman
race - a categorical variable = 1 if the individual is Hispanic, =2 if the individual is Black and =3 otherwise
south14 - a dummy variable = 1 if the individual lived in the south when they were 14 years old
wdad14 - a dummy variable =1 if the individual lived with their father when they were 14
wmom14 - a dummy variable = 1 if the individual lived with their mother when they were 14
dadwork - a dummy variable = 1 if the individual's father worked when they were 14. This is set to 0 if they didn't know, which often happens if they didn't live with dad, so this variable should always be used along with wdad14
momwork - a dummy variable = 1 if the individual's mother worked when they were 14. This is set to 0 if they didn't know, which often happens if they didn't live with mom, so this variable should always be used along with wmom14
dadhgc - the number of years of education the individual's father has. This is set to 0 if they didn't know, which often happens if they didn't live with dad, so this variable should always be used along with wdad14
momhgc - the number of years of education the individual's mother has. This is set to 0 if they didn't know, which often happens if they didn't live with mom, so this variable should always be used along with wmom14
numsib - the number of siblings the individual has
hvsib - a dummy variable =1 if the individual has a sibling in the data set
sibid1 - the value of the variable id for the individual's sibling in the data set. This is missing if there is no sibling in the data set
religkid - a categorical variable reporting what religion the individual was at age 14. It is defined as follows:
0 = None, No Religion
1 = Protestant, unspecified
2 = Baptist
3 = Episcopalian
4 = Lutheran
5 = Methodist
6 = Presbyterian
7 = Roman Catholic
8 = Jewish
9 = Other
relignow - a categorical variable reporting what religion the individual is now. It is defined the same as religkid
afqtrev - the percentile in which the individual scored on an intelligence test given in 1979
height - the individual's height, measured in inches
weight - the individual's weight, measured in pounds
health - a dummy variable = 1 if the individual has a health problem that limits the amount or kind of work that can be done
higrad - the number of years of education the individual has completed
numkid - the number of children the individual has
urbrur - a dummy variable = 1 if the individual lives in an urban error
famsz - the number of people in the individual's family (i.e. self, plus spouse, plus dependent children)
faminc - net income for the family in the past year, measured in dollars
povst - a dummy variable =1 if the individual's family was below the poverty line last year
region - a categorical variable for the region the individual lives in. It is defined as follows:
1 = Northeast
2 = North Central
3 = South
4 = West
urate - the unemployment rate for the local labor market of the individual. (note: this variable might look a little funny to you because it was created as the midpoint of a range so that the place the individual lives could not be identified)
marst - a categorical variable for the individual's marital status. It is defined as follows:
1 = never married
2 = married with a spouse present
3 = other