Question 1: Medical researchers at the Cleveland Clinic and St. Vincent's randomly assigned 227 individuals with alcohol problems to one of three alcohol-treatment groups: a 28-day hospitalization only, Alcoholics Anonymous (AA) meetings only, or individual counseling sessions only. Two years later, the researchers identified the patients who had remained sober after completing the program. The data is summarized in the table below. Hospital AA Only Individual Counseling
Stayed sober 28 13 12 53
Did not stay sober 48 63 63 174
76 76 75 227
Using α = .01, test to determine whether the type of treatment program entered is independent of an employee's status following treatment. If and only if it is appropriate to do so, offer an informal comment as to where the single largest change is. If it is not appropriate to conduct this informal analysis, state "N/A".
Question 2: People in the aerospace industry believe the cost of a space project is a function of the weight of the major object being sent into space. The following partial data represents a random sample of seven recent space projects. This data was used to develop a regression model to predict the cost of a space project by the weight of the space object. Use the information provided to answer the questions that follow.
Observation (Project)
Object Weight
(tons)
Project Cost
($ millions)
1 1.897 53.6
2 3.019 184.9
3 . .
4 . .
5 . .
6 2.100 110.4
7 2.387 104.6
4.88465542857 23744.4542857 324.159242857
X 1.70028571429 73.8285714286
X Y XY SS SS SS
Y
Calculate the least squares regression equation for predicting the cost of a space project as a function of the weight of the major object being sent into space. Report the resulting least squares regression equation in the proper format.
a. Interpret the practical meaning of the slope of the least squares regression line (i.e., in the context of the problem).
b. Identify the independent and dependent variables in this regression analysis.
c. Using a significance level of α = .05 and a t-test, is there sufficient evidence to conclude that the weight of the major object being sent into space is useful in predicting the cost of a space project?
d. What percentage of the total variability in the cost of a space project can be explained by knowing the weight of the major object being sent into space?
e. Calculate the coefficient of correlation between the independent and dependent variables. Comment on what the magnitude and direction of this correlation coefficient says about the linear relationship between the independent and dependent variable.
f. Construct a 95% confidence interval estimate of the mean cost of all space projects when the weight of the major object being sent into space is 1.5 tons. Interpret the practical meaning of this interval estimate, in plain English (i.e., in the context of the problem).
g. Construct a 95% prediction interval for the cost of a single space project when the weight of the object being sent into space is 1.5 tons. Interpret the practical meaning of this interval estimate, in plain English.
h. Construct a 95% confidence interval estimate of the true population slope for this least squares regression line. Interpret the practical meaning of your interval estimate, in plain English.
i. Calculate the estimated variance of the random errors for this regression analysis.
j. Calculate the residual for the 2nd observation in the data set.
Question 3: Analysts at a company that produces small appliances are looking at sales of their blender in a medium-size city in the Midwest. They have noticed that sales in this city have not been meeting forecast values for several months and want to look at the problem in more detail. They have collected data on monthly sales ($), advertising expenditure ($), the number of competing products available, and warranty period of the item from 9 retail outlets they sell the blender to. Their intent was to develop a multiple regression model that will predict average monthly sales of the blender using the 3 independent variables noted. The data for this study and partial Excel output is provided below. Use this information to answer the questions that follow. Observation Sales ($) Advertising ($)
Number of
Competitors
Warranty
(years)
1 4565 459 1 2.00
2 4896 545 0 0.25
3 4480 472 2 1.00
4 4300 482 3 2.00
5 3502 435 3 0.25
6 4413 499 3 1.00
7 5868 604 0 1.00
8 4527 501 1 1.00
9 3849 370 3 1.00
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.943776
R Square
Adjusted R Square
Standard Error
Observations
ANOVA
SS Df MS F
Regression 3117368.0
Residual 76497.82
Total 3499857.0
Coefficients Standard Error t Stat
Intercept 1456.867 1219.677 1.194469
Advertising 6.432963 2.171256
Competition -188.249 110.2911
Warranty 232.3621 158.6014
a. What is the least squares regression model that will predict monthly sales of the blender from advertising expenditure, the number of competing products available, and warranty period of the item?
b. Interpret the practical meaning (i.e., in the context of the problem) of each slope in the regression model.
c. Construct a 95% confidence interval estimate of the true population slope for the advertising variable. Interpret the practical meaning of the resulting interval, in plain English (i.e., in the context of the problem).
d. Construct an approximate 98% prediction interval to predict monthly sales of the blender for a store that spends $500 on advertising, has 2 competitors, and offers a 18 month warranty on the item. Interpret the practical meaning of this interval estimate, in plain English (i.e., in the context of the problem).
e. At the 0.05 level of significance, is the overall regression model statistically significant? Do a complete and appropriate hypothesis test.
f. Is the variable "number of competitors" useful in the prediction of monthly sales? Test this hypothesis using α = .05.
g. After taking into account the sample size and number of independent variables in the model, what
percent of the total variability in monthly sales can be explained by this regression model?
h. Calculate the residual for the first observation in the data set.
i. Examine the correlation matrix below.
Correlation Matrix
Sales ($) Advertising ($) Number of Competitors Warranty (yrs.)
Sales ($) 1
Advertising ($) 0.887588573 1
Number of Competitors -0.809863839 -0.730337266 1
Warranty (yrs.) 0.097818155 -0.111529512 0.132896747 1
Is there any evidence of multicollinearity in this multiple regression model? Fully explain your answer, including the names of the variables exhibiting multicollinearity. If no multicollinearity exists, state "N/A".
Question 4: An accounting firm collected the data in the table below in an attempt to explain variation in client profitability:
Net Profit Hours with Client Type of Client
2345 45 1
4200 56 2
278 26 3
1211 56 2
1406 24 2
500 23 3
-700 34 3
3457 45 1
2478 47 1
1975 24 2
206 32 3
where:
Y = Net Profit (net profit earned from the client)
X1 = Hours with Client (number of hours spent working with the client)
X2 = Type of Client: 1 if manufacturing
2 if service
3 if governmental
The accounting firm wants you to develop a multiple regression model in order to predict net profit using the 2 independent variables described above. Enter the appropriate dummy variable coding that will incorporate "Type of Client" into the regression model. You must use GOVERNMENTAL as the base level for the dummy
variable(s).
Net Profit Hours with Client Type of Client
2345 45 1
4200 56 2
278 26 3
1211 56 2
1406 24 2
500 23 3
-700 34 3
3457 45 1
2478 47 1
1975 24 2
206 32 3
Question 5: An accounting firm collected the data in the table below in an attempt to explain variation in client profitability:
Net Profit Hours with Client Type of Client
2345 45 1
4200 56 2
278 26 3
1211 56 2
1406 24 2
500 23 3
-700 34 3
3457 45 1
2478 47 1
1975 24 2
206 32 3
where:
Y = Net Profit (net profit earned from the client)
X1 = Hours with Client (number of hours spent working with the client)
X2 = Type of Client: 1 if manufacturing
2 if service
3 if governmental
The accounting firm wants you to develop a multiple regression model in order to predict net profit using the 2 independent variables described above. Assume that the multiple regression analysis was conducted, and that the variable "Dummy 1" was coded as 1 if the client was a manufacturing client. Governmental clients were the base-level category for the dummy variables. The Excel output from this regression analysis appears below:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.835474124
R Square 0.698017012
Adjusted R Square 0.568595732
Standard Error 975.3064045
Observations 11
ANOVA
df SS MS F Significance F
Regression 3 15390889.56 5130296.519 5.393371239 0.030801269
Residual 7 6658558.078 951222.5826
Total 10 22049447.64
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -586.2555597 974.2029083 -0.601779727 0.566292865 -2889.879382 1717.368263
Hours with Client 22.86106295 29.33445824 0.779324532 0.461318736 -46.5039084 92.2260343
Dummy 1 (Manufacturing) 2302.267018 895.0615733 2.572188425 0.036889988 185.7827162 4418.751321
Dummy 2 (Service) 1869.813042 764.538844 2.445674352 0.044387958 61.96595064 3677.660133
Interpret the practical meaning (i.e., in the context of the problem) of the slope for the variable "Dummy 1
(Manufacturing)."
Question 6: Before the rush began for Christmas shopping, a department store had noted that the percentage of its customers who use the store's credit card, the percentage of those who use a major credit card, and the percentage of those who pay cash are the same. During the Christmas rush in a sample of 150 shoppers, 46 used the store's credit card; 43 used a major credit card; and 61 paid cash. Using a significance level of α = 0.05, test to see if the methods of payment have changed during the Christmas rush. If and only if it is appropriate to do so, offer an informal comment as to where the single largest change is. If it is not appropriate to conduct this
informal analysis, state "N/A".