Assignment
This assignment consists of two sections: 1) a quiz with fill-in-the-blank questions; and 2) a SPSS data project.
SECTION 1: QUIZ
Regression with a Dummy Independent Variable:
1. Consider data on personal income (PI) of married and unmarried women. Suppose you find that the average PI is $50,000 for married women and $40,000 for unmarried women. Let Y=PI and X=dummy for married (that is, 1 = married, 0 = unmarried)
1) How much more do married women make compared to unmarried women, on average?__________________
2) Write down the estimated regression model Y = a + b*X (all info needed is given):
3) Interpret the intercept term: ____________________________________________
4) Interpret the slope term: _______________________________________________
Multiple Linear Regression:
2. Consider the following model of income with three independent variables from the lecture:
Y = a + b1*X1 + b2*X2 + b3*X3 = - 9,239 - 4,195 * X1 + 141 * X2 + 3,020 * X3
where X1 is a dummy variable, 0=male, 1=female
and X2 is years of work experience
and X3 is years of education
1) How much more do men earn compared to women on average?
2) How much do women with 10 years of work experience and 16 years of education earn on average?
3) How much do men with 10 years of work experience and 12 years of education earn on average?
4) What variables that could (further) mediate the effect of gender on earnings are omitted here?
Two-Way Table: Marginal Distribution and Conditional Distribution
3. Consider the two-way table below based on a four year study about the relationship between anger and heart disease among a random sample of individuals. The subjects (i.e. participants in the study) were free of heart disease at the beginning of the study when they took a test that measured how prone they were to sudden anger. Their heart health was monitored over a four year period and it was recorded whether they developed Coronary Heart Disease (CHD). In short, the study attempts to examine whether anger levels are associated with the likelihood of developing coronary heart disease. Now please answer the questions (a) to (c):
1) In the two-way table below, report the marginal distributions and the total sample size, in counts and percent.
2) In the two-way table below, report the conditional distributions of Coronary Heart Disease in percent. Note that the conditional distribution of Coronary Heart Disease refers to the distribution of Coronary Heart Disease given a certain Anger Level.
3) With reference to your calculations above, discuss whether there is potential association between anger and Coronary Heart Disease.
#Individuals
|
Coronary Heart Disease
|
NO Coronary Heart Disease
|
|
Low Anger
|
530
|
3,057
|
|
Moderate Anger
|
1,100
|
4,621
|
|
High Anger
|
270
|
606
|
|
|
|
|
|
SECTION 2: SPSS PROJECT
1. Regression with One Independent Variable vs. Regression with Multiple Independent Variables
Use the dataset from Assignment#2 (StateData_hw2.sav) to estimate the following two models, and then answer questions 1) to 4):
Model 1: Estimate and write down a regression model predicting the heart disease death rate based on the percent of smokers. [You may have done this already in Assignment#2. If so, just repeat the estimation.]
Model 2: Estimate and write down a regression model predicting the heart disease death rate based on the percent of smokers (X1) and state median household income (X2).
[Hint: Topic about Multiple Regression was covered in Lecture Note #6.2. For the second model estimation based on two variables X1 and X2, the SPSS procedures are: Analyze Regression Linear select variable as Dependent variable and Independent variable, here you select two variables, X1 and X2, as Independent variables click "OK".]
1) Provide the regression equations for both models and the corresponding values for R2.
2) For the second model, provide interpretations of the constant term and the two slopes.
3) Explain intuitively why the effect of % smoking changed the way it did when the median income was accounted for.
[No loss of points for this question. Just give it a try. I hope to encourage you to think harder about the effect of each independent variable, as well as the interaction of the effects, in the multiple regression model. Formal discussion about such problems may come in 9172.]
4) Use the two models to predict the HDDR (heart disease death rate) for New York State, and then compare the two predicted values to the actual value of HDDR for New York State.