Question 1- Suppose you are interested in estimating the ceteris paribus relationship between y and x1. For this purpose, you can collect data on two control variables, x2 and x3. Let β1˜ be the simple linear regression estimate from y on x1, and let β1^, be the multiple regression estimate from y on x1, x2, and x3.
a. Assume that x1 is not correlated with x2 but that x2 and x3 have a large partial effect on y. Would you expect β1˜ and β1^ to be similar or very different? Explain, and draw a Venn diagram to illustrate your answer.
Question 2- I know that y depends linearly on x, but I am not sure whether or not it also depends on another variable z. A friend suggests that I should regress y on x first, calculate the residuals, and then see whether they are correlated with z. What advice would you offer your friend?
Question 3- Suppose you are interested in predicting the price of a laptop computer based on its various features.
Price: the price of the laptop in dollars
Speed: speed of the laptop in megahertz
Charge: time (in minutes) the battery takes to charge
The partial Excel results are shown here:
SSUnexplained = 1698387.21
MSExplained = 554446.062
ANOVA |
|
df |
SS |
MS |
F |
Significance F |
Regression |
|
|
|
|
0.00479 |
Residual |
|
|
|
|
|
Total |
21 |
|
|
|
|
|
Coffficients |
Standard Error |
t Stat |
p-value |
Intercept |
1500.6007 |
1085.6889 |
|
|
Speed |
10.1506 |
14.5019 |
|
|
Charge |
1.6284 |
4.4568 |
|
|
a. Complete the ANOVA table.
b. Find R2. Explain what the number tells you about this regression model.
c. Test whether risk is related to any of the X's (the overall significance of the model). Use a significance level of 0.05.
d. Test whether price is related to speed. Use a significance level of 0.05. What does this result imply?
e. Explain in the context of this problem what the numbers for the following coefficients mean: speed and charge.
f. Find a 95 confidence interval for speed. What parameter does this confidence interval bracket?
g. What is the standard error of the regression?
h. Write out the regression equation.
i. Estimate the price of a laptop with a speed of 33 megahertz and the charge lasts 305 minutes.
Question 4- A 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking relate to the risk of strokes. Assume that the regression results pertain to data on the following.
Risk: the probability that the patient will have a stroke over the next 10-year period
Age: age of the person
Pressure: blood pressure
Family: number of strokes the person's parents have had
The partial Excel results are shown here:
ANOVA |
|
df |
SS |
MS |
F |
Significance F |
Regression |
3 |
3660.74 |
1220.25 |
36.82 |
2.06E-07 |
Residual |
16 |
530.21 |
33.14 |
|
|
Total |
19 |
4190.95 |
|
|
|
|
Coefficients |
Standard Error |
t Stat
|
p-value |
Intercept |
-91.76 |
15.22 |
|
|
Age |
1.08 |
0.17 |
|
|
Pressure |
0.25 |
0.05 |
|
|
Family |
8.74 |
3 |
|
|
a. What is the R2 of the regression model? What does it mean?
b. What is the standard error of the regression? What does it mean?
c. Explain in the context of this problem what the numbers for the following coefficients mean: Age, Pressure, and Family.
d. Test the overall significance of the regression model at a 5 percent significance level.
e. Is pressure statistically significant in explaining risk? Test this hypothesis at a 10 percent significance level.
f. Find a 95 percent confidence interval for age. What does this interval mean, and what parameter does it bracket?
g. Estimate the risk of a heart attack for a person who is 70 years old, with a blood pressure of 165, and whose parents have had one stroke.
h. Can you see a potential problem with predicting using this regression model?