Problem 1:
Students in a management science class have just received their grades on the first test. The instructor
has provided information about the first test grades in some previous classes as well as the final average for the same students. Some of these grades have been sampled and are as follows:
STUDENT 1 2 3 4 5 6 7 8 9
1st test grade 98 77 88 80 96 61 66 95 69
Final average 93 78 84 73 84 64 64 95 76
(a) Develop a regression model that could be used to predict the final average in the course based on the first test grade.
(b) Predict the final average of a student who made an 83 on the first test.
(c) Give the values of r and for this model. Interpret the value of in the context of this problem.
Problem 2:
Using the data in Problem 4, test to see if there is a statistically significant relationship between the
grade on the first test and the final average at the 0.05 level of significance. Use the formulas in this
chapter and Appendix D.
Problem 3:
Bus and subway ridership in Washington, D.C., during the summer months is believed to be heavily tied to the number of tourists visiting the city. During the past 12 years, the following data have been obtained:
NUMBER
OF TOURISTS RIDERSHIP
YEAR (1,000,000s) (100,000s)
1 7 15
2 2 10
3 6 13
4 4 15
5 14 25
6 15 27
7 16 24
8 12 20
9 14 27
10 20 44
11 15 34
12 7 17
(a) Plot these data and determine whether a linear model is reasonable.
(b) Develop a regression model.
(c) What is expected ridership if 10 million tourists visit the city?
(d) If there are no tourists at all, explain the predicted ridership.
Instructions 3:
Complete Problem 7
Complete Problem 8
Complete Problem 9
Complete Problem 10
Problem 4:
The following data give the selling price, square footage, number of bedrooms, and age of houses
that have sold in a neighborhood in the past 6 months. Develop three regression models to predict
the selling price based upon each of the other factors individually. Which of these is best?
SELLING SQUARE AGE
PRICE($) FOOTAGE BEDROOMS (YEARS)
64,000 1,670 2 30
59,000 1,339 2 25
61,500 1,712 3 30
79,000 1,840 3 40
87,500 2,300 3 18
92,500 2,234 3 30
95,000 2,311 3 19
113,000 2,377 3 7
115,000 2,736 4 10
138,000 2,500 3 1
142,500 2,500 4 3
144,000 2,479 3 3
145,000 2,400 3 1
147,500 3,124 4 0
144,000 2,500 3 2
155,500 4,062 4 10
165,000 2,854 3 3
Problem 5:
Use the data in Problem 7 and develop a regression model to predict selling price based on the
square footage and number of bedrooms. Use this to predict the selling price of a 2,000-square-foot house with 3 bedrooms. Compare this model with the models in Problem 7. Should the number of bedrooms be included in the model? Why or why not?
Problem 6:
Use the data in Problem 7 and develop a regression model to predict selling price based on the
square footage, number of bedrooms, and age. Use this to predict the selling price of a 10-year-old,
2,000-square-foot house with 3 bedrooms.
In addition to the questions in this problem, respond to the following:
a) State the linear equation.
b) Explain the overall statistical significance of the model.
c) Explain the statistical significance for each independent variable in the model
d) Interpret the Adjusted R2.
e) Is this a good predictive equation(s)? Which variables should be excluded (if any) and why? Explain.
Problem 7 Instructions:
Use Excel's regression option to perform the regression. (Use one Excel spreadsheet file for the calculations & explanations, with one worksheet per problem. Use the problem number for each worksheet name. Cells should contain the formulas (i.e., if a formula was used to calculate the entry in that cell).
Problem 10-In 2009, the New York Yankees won 103 baseball games during the regular season. The table on the next page lists the number of victories (W), the earnedrun-average (ERA), and the batting average (AVG) of each team in the American League. The ERA is one measure of the effectiveness of the pitching staff, and a lower number is better. The batting average is one measure of effectiveness of the hitters, and a higher number is better.
(a) Develop a regression model that could be used to predict the number of victories based on the ERA.
(b) Develop a regression model that could be used to predict the number of victories based on the batting average.
TEAM W ERA AVG
New York Yankees 103 4.26 0.283
Los Angeles Angels 97 4.45 0.285
Boston Red Sox 95 4.35 0.270
Minnesota Twins 87 4.50 0.274
Texas Rangers 87 4.38 0.260
Detroit Tigers 86 4.29 0.260
Seattle Mariners 85 3.87 0.258
Tampa Bay Rays 84 4.33 0.263
Chicago White Sox 79 4.14 0.258
Toronto Blue Jays 75 4.47 0.266
Oakland Athletics 75 4.26 0.262
Cleveland Indians 65 5.06 0.264
Kansas City Royals 65 4.83 0.259
Baltimore Orioles 64 5.15 0.268
(c) Which of the two models is better for predicting the number of victories?
(d) Develop a multiple regression model that includes both ERA and batting average. How does this compare to the previous models?