Overview
In this module, many powerful data analysis strategies will be presented. In many situations, decision makers need to conduct a hypothesis test that consists of more than two samples. In this case, a single or two-sample hypothesis test is not adequate for this investigation. ANOVA, short for analysis of variance, is a technique that allows a researcher to analyze means of two or more samples at a time. Though a series of two-sample hypothesis could be conducted in situations where there are more than two samples, ANOVA is better suited because it reduces the likelihood of a researcher making a Type I error. In fact, ANOVA allows researchers to conduct their analysis more accurately by accounting for other sources of variation.
Another extremely useful data analysis technique that will be presented is regression analysis. For this analysis, causal relationships between dependent and independent attributes can be investigated for their significance. With this tool, decision makers can make forecasts and determine if independent attributes are significant in explaining a dependent attribute that is being modeled. For example, monthly sales volume (i.e. demand) might be forecasted with various sources of advertising (i.e. TV, Internet, Newspaper, etc.). In this example, a decision maker could model demand and decide if a particular type of advertising was statistically significant in increasing demand.
Finally, the module concludes by reviewing techniques that decision makers use when events that might influence their decision are mostly unknown. For example, perhaps you are making a decision on whether or not to expand your current facility or build a new one. This decision might be related to the anticipated demand for a certain product over the next few years, which can be estimated within reason, but is certainly not know. Techniques will be presented to help decision makers with problems where events, or states of nature, are unknown based on a preference of being optimistic, pessimistic, or realist.
Data Analysis ToolPak
The Data Analysis ToolPak is a Microsoft Office Excel add-in program that is available when you install Microsoft Office or Excel. The learning activities in this module will utilize the Data Analysis ToolPak, so it is necessary to have this add-in installed. To install the Data Analysis ToolPak:
Click the File tab, click Options, and then click the Add-Ins category.
In the Manage box, select Excel Add-ins and then click Go.
In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
NOTE: There are two options for the Data Analysis ToolPak. One provides extra support for Visual Basic for Application (VBA) and the other one does not. You can select both options if you'd like, but the only option you need for this course is the Data Analysis ToolPak (without VBA).
Learning Outcomes
Upon completion of this module, you will be able to:
ANOVA
- Perform a test of hypothesis to determine whether the variances of two populations are equal.
- Describe the ANOVA approach for testing difference in sample means.
- Conduct a test of hypothesis among three or more treatment means and describe the results.
- Carry out a test of hypothesis among treatment means using a blocking variable and understand the results.
Correlation and Linear Regression
- Define the terms dependent and independent variable.
- Calculate, test, and interpret the relationship between two variables using the correlation coefficient.
- Apply regression analysis to estimate the linear relationship between two variables
- Interpret the regression analysis.
- Evaluate a regression equation to predict the dependent variable.
Multiple Regression Analysis
- Describe the relationship between several independent variables and a dependent variable using multiple regression analysis.
Decision Making
- Compute and interpret the expected values for a payoff table.
- Explain and interpret opportunity loss.
- Describe three strategies for decision making.
- Organize possible outcomes into a decision tree and interpret the result
Content Delivery
Start with the Excel workbook (spreadsheet) Content Delivery - start.xlsx.
Debates have been ongoing if a traditional classroom settings provide a more effective way to deliver educational content over more modern alternatives such as online or hybrid online course offerings. In order to shed light on this debate an experiment was conducted over a fifteen week period. In total, 90 students were randomly separated into 3 different courses. In other words, 30 students were in a traditional classroom for a semester, 30 were asked to take the course in a hybrid online model, and the remaining 30 were asked to take the course online. The experiment was controlled as much as possible. For example, the course content was created by a single Professor (i.e. lectures, recorded videos, etc.). At the end of the 15 week investigation, all students were asked to take the same final exam which contained 50 questions. The scores (i.e. number of questions answered correctly) from the 90 students are shown below in the table. Using an alpha value of 0.05, is it reasonable to conclude that there are differences in student scores based on the content delivery method used?
Classroom
|
Hybrid
|
Online
|
21
|
30
|
40
|
26
|
22
|
33
|
22
|
43
|
33
|
28
|
34
|
42
|
30
|
28
|
38
|
28
|
33
|
39
|
42
|
30
|
17
|
32
|
28
|
38
|
21
|
17
|
17
|
45
|
44
|
25
|
36
|
30
|
26
|
38
|
28
|
34
|
21
|
34
|
17
|
32
|
22
|
35
|
18
|
47
|
31
|
39
|
34
|
46
|
50
|
48
|
37
|
27
|
23
|
42
|
37
|
18
|
30
|
26
|
36
|
17
|
30
|
18
|
22
|
21
|
47
|
23
|
21
|
40
|
23
|
33
|
26
|
46
|
21
|
44
|
33
|
31
|
28
|
50
|
49
|
23
|
45
|
50
|
44
|
17
|
43
|
27
|
38
|
23
|
42
|
45 |
a. What is the null hypothesis statement for this problem?
b. What is the alternative hypothesis statement for this problem?
c. What is alpha for this analysis?
d. What is the most appropriate test for this problem? (choose one of the following)
• F-Test Two-Sample for Variance
• Anova: Single Factor
• Anova: Two-Factor with Replication
• Anova: Two-Factor without Replication
e. What is the value of the test statistic for the most appropriate analysis?
f. What is the value of the critical value for the most appropriate analysis?
g. Is it reasonable to conclude that there is a difference in the mean scores based on the content delivery method used in this experiment? (choose one of the following)
• Yes
• No
h. What is the p-value for this analysis?
Real Estate Tax
The city of Tucson, Arizona, employs people to assess the value of homes for the purpose of establishing real estate tax. The city manager sends each assessor to the same five homes and ten compares the results. The information is given below, in thousands of dollars. Can we conclude that there is a difference in the assessors, at a significance level of 0.05?
Home
|
Smith
|
Norman
|
Thomas
|
Holiday
|
A
|
53
|
55
|
49
|
45
|
B
|
50
|
51
|
52
|
53
|
C
|
48
|
52
|
47
|
53
|
D
|
70
|
68
|
65
|
64
|
E
|
84
|
89
|
92
|
86
|
Part A: Assessor
a. What is the null hypothesis statement for this problem?
b. What is the alternative hypothesis statement for this problem?
c. What is alpha for this analysis?
d. What is the most appropriate test for this problem? (choose one of the following)
- F-Test Two-Sample for Variance
- Anova: Single Factor
- Anova: Two-Factor with Replication
- Anova: Two-Factor without Replication
e. What is the value of the test statistic for the most appropriate analysis?
f. What is the value of the critical value for the most appropriate analysis?
g. Is it reasonable to conclude that there is a difference in the mean home value based on the assessor? (choose one of the following)
• Yes
• No
h. What is the p-value for this analysis?
Part B: Home
a. What is the null hypothesis statement for this problem?
b. What is the alternative hypothesis statement for this problem?
c. What is alpha for this analysis?
d. What is the most appropriate test for this problem? (choose one of the following)
• F-Test Two-Sample for Variance
• Anova: Single Factor
• Anova: Two-Factor with Replication
• Anova: Two-Factor without Replication
e. What is the value of the test statistic for the most appropriate analysis?
f. What is the value of the critical value for the most appropriate analysis?
g. Is it reasonable to conclude that there is a difference in the value based on the home? (choose one of the following)
• Yes
• No
h. What is the p-value for this analysis?
NOTE: In order to receive credit for this problem, you must show your work and you must use the appropriate formulas in Excel to answer the questions. Format all percentages as a percentage with 2 decimal places (i.e. 78.29%). Show all other numbers with 3 decimal places.
Stadium Instructions
A professional baseball team is in a bad slump (i.e. losing streak). Ticket sales are plummeting and the Director of Sports Operations is looking for ways to generate more revenue at the gate. Data from the last 30 weeks has been collected in terms of the number of fans in attendance (i.e. gate receipts). In addition to this attendance, data has been collected in terms of various sources of advertising (i.e. television, radio, newspapers, magazines, mail, and internet) and other forms promotion.
|
|
|
Baseball Data
|
|
|
|
Television Advertising
|
Radio Advertising
|
Newspaper Advertising
|
Magazine Advertising
|
Mail Postal Advertising
|
Internet Advertising
|
Sales Promotions
|
Gate Receipts
|
60,000
|
30,000
|
68,000
|
60,000
|
30,000
|
20,000
|
60,000
|
620,000
|
75,000
|
35,000
|
99,000
|
70,000
|
10,000
|
30,000
|
70,000
|
750,000
|
98,000
|
87,000
|
98,000
|
90,000
|
40,000
|
80,000
|
90,000
|
950,000
|
80,000
|
80,000
|
90,000
|
80,000
|
30,000
|
70 000
|
70,000
|
780,000
|
90,000
|
100,000
|
99,000
|
90,000
|
40,000
|
88,000
|
90,000
|
950,000
|
90,000
|
90,000
|
97,000
|
90,000
|
10,000
|
89 000
|
90,000
|
910,000
|
70,000
|
75,000
|
79,000
|
70,000
|
40,000
|
80 000
|
70,000
|
720,000
|
90,000
|
98,000
|
95,000
|
90,000
|
50,000
|
100,000
|
90,000
|
950,000
|
90,000
|
90,000
|
85,000
|
90,000
|
55 000
|
100,000
|
80,000
|
880,000
|
100,000
|
95,000
|
82,000
|
90,000
|
50,000
|
100,000
|
90,000
|
970,000
|
50,000
|
50,000
|
81,000
|
50,000
|
56,000
|
60 000
|
40,000
|
490,000
|
71,000
|
75,000
|
87,000
|
70,000
|
60,000
|
70 000
|
70,000
|
700,000
|
75,000
|
80,000
|
92,000
|
70,000
|
58,000
|
80 000
|
75,000
|
750,000
|
80,000
|
85,000
|
89,000
|
70,000
|
50,000
|
80 000
|
70,000
|
780,000
|
90,000
|
88,000
|
97,000
|
90,000
|
80,000
|
85,000
|
80,000
|
890,000
|
88,000
|
90,000
|
81,000
|
90,000
|
82,000
|
90,000
|
80,000
|
870,000
|
80,000
|
85,000
|
91,000
|
80,000
|
75,000
|
60,000
|
70,000
|
790,000
|
78,000
|
85,000
|
97,000
|
85,000
|
76,000
|
70 000
|
80,000
|
850,000
|
60,000
|
60,000
|
46,000
|
60,000
|
77,000
|
89,000
|
50,000
|
570,000
|
80,000
|
80,000
|
80,000
|
80,000
|
80,000
|
75,000
|
80,000
|
810,000
|
75,000
|
80,000
|
80,000
|
76,000
|
85 000
|
80 000
|
70,000
|
760,000
|
88,000
|
90,000
|
89,000
|
90,000
|
86,000
|
90,000
|
80,000
|
880,000
|
80,000
|
85,000
|
94,000
|
80,000
|
90,000
|
90,000
|
80,000
|
830,000
|
100,000
|
95,000
|
71,000
|
100,000
|
88,000
|
100,000
|
90,000
|
970,000
|
58,000
|
65,000
|
97,000
|
60,000
|
95,000
|
70,000
|
60,000
|
600,000
|
86,000
|
80,000
|
85,000
|
80,000
|
102,000
|
80 000
|
80,000
|
840,000
|
91,000
|
90,000
|
50,000
|
90,000
|
100,000
|
90 000
|
80,000
|
870,000
|
99,000
|
90,000
|
90,000
|
100,000
|
90,000
|
90,000
|
90,000
|
950,000
|
70,000
|
70,000
|
94,000
|
60,000
|
99 000
|
70,000
|
60,000
|
640,000
|
79,000
|
75,000
|
81,000
|
80,000
|
95 000
|
85,000
|
80,000
|
800,000
|
|
The director would like to know how the forms of advertising influences gate receipts. Use the most appropriate analysis to answer the following questions.
Part A: Test
a. What is the most appropriate test to answer the following questions? (choose one below)
• Anova: Single Factor
• Anova: Two-Factor with Replication
• Anova: Two-Factor Without Replication
• Correlation
• F-Test Two-Sample for Variance
• Regression
• t-Test: Paired Two Sample for Means
• t-Test: Two Sample Assuming Equal Variances
• t-Test: Two-Sample Assuming Unequal Variances
• z-Test: Two Sample for Means
b. What is the dependent attribute for this question? (choose one below)
• Television Advertising
• Radio Advertising
• Newspapers Advertising
• Magazines Advertising
• Mail Postal Advertising
• Internet Advertising
• Sales Promotion
• Gate Receipts
Part B: Global Test
Carry out a global test of hypothesis to verify if any of the regression coefficients are different than zero (i.e. significant).
c. If the null hypothesis is that all of the regression coefficients are equal to zero and the alternative hypothesis is that at least one regression coefficients is not equal to one, should the null hypothesis be rejected?
• Yes
• No
Part C: Individual Test
Carry out an individual test of hypothesis to verify if any of the regression coefficients are different than zero. From the individual test of the regression coefficients:
d. Is the Intercept statistically significant (i.e. not zero)?
e. Is Television Advertising statistically significant (i.e. not zero)?
f. Is Radio Advertising statistically significant (i.e. not zero)?
g. Is Newspaper Advertising statistically significant (i.e. not zero)?
h. Is Magazine Advertising statistically significant (i.e. not zero)?
i. Is Mail Postal Advertising statistically significant (i.e. not zero)?
j. Is Internet Advertising statistically significant (i.e. not zero)?
k. Is Sales Promotions statistically significant (i.e. not zero)?
Part D: Estimate
Based on your initial analysis, re-run the analysis if any regression coefficient was determined to be insignificant (i.e. equal to zero) in the previous part. In other words, remove all insignificant coefficients from your study (i.e. all of them at once, not individual as in step-wise regression). If you need to rearrange or modify the original dataset to conduct further analysis, please make a copy of the data and paste the data to the right of the original data. Do not disturb the locations of the "answer cells." If all coefficients were found to be statistically significant, there is no need to re-run the analysis.
Television Advertising
|
Radio Advertising
|
Newspaper Advertising
|
Magazine Advertising
|
Mail Postal Advertising
|
Internet Advertising
|
Sales Promotion
|
$92,000
|
$42,000
|
$80,000
|
$73,000
|
$27,000
|
$78,000
|
$60,000
|
$61,000
|
$90,000
|
$83,000
|
$88,000
|
$91,000
|
$56,000
|
$51,000
|
$74,000
|
$70,000
|
$60,000
|
$61,000
|
$15,000
|
$85,000
|
$53,000
|
$68,000
|
$44,000
|
$55,000
|
$96,000
|
$73,000
|
$78,000
|
$84,000
|
1. Based on the attribute values in cells C24:124, make a forecast of Gate Receipts in cell J24.
2. Based on the attribute values in cells C25:125, make a forecast of Gate Receipts in cell J25.
3. Based on the attribute values in cells C26:126, make a forecast of Gate Receipts in cell J26.
4. Based on the attribute values in cells C27:127, make a forecast of Gate Receipts in cell J27.
NOTE: In order to receive credit for this problem, you must show your work and you must use the appropriate formulas in Excel to answer the questions. Format all percentages as a percentage with 2 decimal places (i.e. 78.29%). Show all other numbers with 3 decimal places.
The Fish House instructions
The Fish House ("TFH") sells fresh fish and seafood. TFH receives daily shipments of farm-raised trout from a local supplier. Each trout cost $2.45, which are sold for $3.95. To maintain its reputation for freshness, at the end of each day, TFH sells any leftover trout to a local pet food manufacturer for $1.25 each. Historically, the daily demand is summarized in the table below:
Demand
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
Probabilities
|
2%
|
6%
|
9%
|
11%
|
13%
|
15%
|
18%
|
11%
|
7%
|
5%
|
3%
|
The owner of TFH is, of course, interested in his margin, i.e., the sales to customers + sales to pet food manufacturer - purchase cost. Use a Payoff matrix to answer the following questions.
a. How many trout should be ordered each day by maximin?
b. How many trout should be ordered each day by maximax?
c. How many trout should be ordered each day by EMV?
d. How many trout should be ordered each day by minimax regret?
e. How many trout should be ordered each day by EOL?
f. What is the maximum TFH should be willing to pay for a forecast (i.e. EVPI)?
HINT: You will need to calculate a payoff matrix with 11 states of nature and 11 alternatives (it doesn't make sense to order fewer than 10 or more than 20). USE A FORMULA TO CALCULATE THESE 121 PAYOFF AMOUNTS (YOU WILL NEED THE MIN AND MAX FUNCTIONS). NOTE THAT SIX VALUES ARE PROVIDED TO CHECK. The number sold to customers paying $3.95 is the minimum of (1) the quantity ordered and (2) the demand. The number sold to the pet-food manufacturer is the maximum of (1) the quantity ordered minus the demand and (2) zero.
Facility Constr. Instructions
A company needs to decide whether their next manufacturing facility is to be SMALL, MEDIUM, or LARGE. Demand for the products manufactured at this new plant may subsequently turn out to be LOW, MODERATE, or HIGH. The estimated probabilities and the profit (in $10,000) for the various combinations of plant size and demand are given in the following tables:
DEMAND
|
LOW MODERATE HIGH
Probabilities 0.20 0.40 0.40
|
|
|
PROFIT
|
|
PLANT SIZE
|
LOW
|
MODERATE
|
HIGH
|
SMALL
|
150
|
160
|
170
|
MEDIUM
|
100
|
200
|
220
|
LARGE
|
-60
|
80
|
320
|
Once demand is known, the company may attempt to improve profitability by advertising in situations where the size of the plant is larger than appropriate for demand. The parameters describing the various combinations possible are given in the following table:
Plant Size Demand Possible Results
|
Probability
|
Profit
|
|
|
|
Poor
|
0.40
|
90
|
|
Medium
|
Low
|
|
Good
|
0.60
|
130
|
|
|
|
|
|
|
Poor
|
0.20
|
-90
|
|
Large
|
Low
|
Good
|
0.50
|
60
|
|
|
|
Excellent
|
0.30
|
90
|
|
Large
|
Moderate
|
Poor
|
0.40
|
40
|
|
Good
|
0.60
|
100
|
|
|
|
|
Solve this problem by using a decision tree provided in the starting file.
HINT: On the spreadsheet you will find the structure of the decision tree provided. DO NOT RESIZE ROWS OR COLUMNS. Begin by labeling (place text in an appropriate cell) all branches with alternative names and event names and probabilities. Then specify the payoffs at the end of each final branch by entering the correct value in the cell to which to arrow is pointing. Then solve the decision tree by working backwards and labeling each alternative and event node with the correct values and indicate alternative branches not taken with double arrows (There are several double arrow objects in columns x, y and z which you can click and drag to appropriate location). Be sure to label the initial alternative node with the correct value.
Attachment:- HW-2.xlsx