Studies have shown that the frequency with which shoppers browse Internet retailers is related to the frequency with which they actually purchase products and/or services online.
The following data show respondents age and answer to the question "How many minutes do you browse online retailers per week?"
Age (X) |
Time (Y) |
23 |
513 |
51 |
207 |
45 |
201 |
33 |
405 |
56 |
141 |
61 |
141 |
39 |
297 |
23 |
501 |
22 |
531 |
46 |
273 |
53 |
147 |
34 |
381 |
20 |
591 |
18 |
609 |
22 |
519 |
1 Use Data > Data Analysis > Correlation to compute the correlation checking the Labels checkbox.
2 Use the Excel function =CORREL to compute the correlation. If answers for #1 and 2 do not agree, there is an error.
3 The strength of the correlation motivates further examination.
a) Insert Scatter (X, Y) plot linked to the data on this sheet with Age on the horizontal (X) axis.
b) Add to your chart: the chart name, vertical axis label, and horizontal axis label.
c) Complete the chart by adding Trendline and checking boxes
4 Read directly from the chart:
a) Intercept =
b) Slope =
c) R2 =
5 Perform Data > Data Analysis > Regression.
6 Read the standard error in the regression output?
7 Based on the regression output, what is the equation of the regression line?
8 Use Excel to predict the number of minutes spent by a 40-year old shopper. Enter = followed by the regression formula.
Enter the intercept and slope into the formula by clicking on the cells in the regression output with the results.
9 On this worksheet, make an XY scatter plot linked to the following data:
X |
Y |
6.981 |
12.266 |
3.982 |
8.455 |
2.084 |
5.951 |
9.113 |
14.395 |
2.280 |
7.435 |
6.567 |
11.332 |
1.897 |
7.011 |
7.186 |
12.716 |
4.094 |
9.214 |
1.257 |
7.499 |
7.199 |
10.473 |
2.136 |
6.124 |
3.032 |
8.832 |
3.735 |
9.295 |
8.612 |
35.000 |
0.338 |
7.155 |
5.348 |
10.475 |
9.208 |
13.650 |
7.570 |
13.910 |
8.646 |
11.895 |
1.953 |
7.387 |
3.475 |
7.871 |
3.962 |
10.482 |
8.084 |
11.727 |
4.866 |
8.688 |
10 Add trendline and regression equation to the plot.
11 The scatterplot reveals a point outside the point pattern. Copy the data to a new location in the worksheet. You now have 2 sets of data. Data that are more tha 1.5 IQR below Q1 or more than 1.5 IQR above Q3 are considered outliers and must be investigated. It was determined that the outlying point resulted from data entry error. Remove the outlier in the copy of the data.
12 Make a new scatterplot linked to the cleaned data without the outlier, and add trendline and regression equation label.
13 Compare the regression equations of the two plots. How did removal of the outlier affect the slope and R2?
Highlight the correct answer or answers (#17 & 19) for each of the following questions:
14 The correlation R measures the strength of the linear association of variables Y and X, and does not have a unit of measure, e.g. feet, acres, pounds, seconds.
- True
- False
15 Based on the correlation computed in tab "Excel Competencies", does Time tends to increase with Age?
- True
- False
16 The strength of the linear relationship between Age and the Time is
- Weak
- Moderate
- Strong
17 Highlight the 4 correct statements. Try not to mix up explanatory and response with dependent and independent.
- X denotes the independent or response variable
- X denotes the independent or explanatory variable
- Y denotes the dependent or explanatory variable
- Y denotes the dependent or response variable
- x denotes an observed value of the independent variable
- x denotes an observed value of the dependent variable
- denotes the mean value of observations of the response variable
18 The best fitting line minimizes the vertical distances from the points to the line. Hence,
the Y coordinate of a point on the best fitting line provides an estimate or prediction of Y at the value of the corresponding X coordinate.
This process is called regression (to move backward) because
- The estimate of Y will be closer to the mean in standard deviations than X is.
- The estimate of Y will be farther from the mean in standard deviations than X is.
19 Highlight 4 assumptions pertaining to regression:
- Scatter plot pattern is reasonably straight (Linearity)
- No points lie far enough away to pull the line of best fit away from the main point pattern (Influence).
- The plot does not fan out as x increases or decreases (Equal Spread)
- Predict Y at a value of X within the range of the X data (Interpolation)
- Predict Y at a value of X outside the range of the X data (Extrapolation)
- The observations are independent (Independence)
20 Based on the data in "Excel Competencies", can Y be predicted for a person who is 80?
- No
- Yes
21 The "intercept" and "slope" completely define the best fitting line.
The intercept is the vertical distance from the origin (where the X and Y axes intersect) up or down to the line. It sets the elevation of the line.
- TRUE
- FALSE
22 As a positively signed slope increases
- The whole line moves up without rotating
- The whole line moves down without rotating
- The line rotates clockwise becoming less steep
- The line rotates counterclockwise becoming more steep
23 Based on the regression output in "Excel Competencies", when Age increases by 1 year, Time decreases by
- 0.97
- 32.10
- 750.02
- 11.503
24 R2 measures the fit of the line to the points. As R2 increases
- The scatter about the line increases and the amount of the variation of Y explained by X decreases
- The scatter about the line decreases and the amount of the variation of Y explained by X increases
- The scatter about the line increases and the amount of the variation of Y explained by X increases
- The scatter about the line decreases and the amount of the variation of Y explained by X decreases
25 The Standard Error is a standard deviation measuring the scatter of the points about the regression line.
- True
- False
Attachment:- Homework.xlsx