Please have Excel pick a random number between 0 and 9 (with the function =Randbetween (0, 9).
By dragging and copying, create 5 rows of two columns of random numbers. It should look like this, for example (of course everyone's numbers, being random, will be different):
X Y
4 6
2 0
2 9
1 4
6 5
Then ask Excel to run the linear regression between these two columns and generate a p-value for testing the significance of the linear relationship between X and Y. Use a significance level (alpha) of 10%.
Since these are pairs of random numbers, there really is no relationship between X and Y. So if a p-value is less 0.10 that is because a Type I error has occurred. The Probability(of making the Type I error)=alpha=α=0.10.
P(Type I error)=P(rej H0 |H0 is true)=alpha=0.10.
Part 1: Repeat this process 20 times and send me a table summarizing your results. It should look like this:
Trial # slope correlation p-value of slope significant?
1 -.23 -.13 0.43 No
2 .05 .02 0.55 No
...
20 .11 .12 0.25 No
Part 2: Then, do one last regression simulation, this time using 2000 pairs of random numbers (by dragging the first line down to row 2001). Use Excel to make a graph of the scatter diagram and the straight line that best fits the last set of 2000 pairs. Have Excel calculate the slope, intercept, correlation, p-value.
Report on all your findings and interpret what you saw (among the twenty (20) repeats). The report should answer the following eight questions:
Part 3: How does the number (%) of significant p-values correspond to the theory of a Type I error?
Part 4: What was the average slope?
Part 5: What was the average intercept?
Part 6: Explain why the numbers from Parts 4 and 5 make sense.
Part 7: How do you explain the results you got for the 2000 pairs? (Discuss the slope, intercept, correlation, p-value.)
Part 8: If there are 100 students doing this assignment (using 2000 pairs) using an alpha (significance level) of 10% how many of the students are expected to report a significant intercept?
Part 9: Calculate the 95% prediction interval of y when x=5 using the 2000 pairs.
Part 10: Explain the result you got in Part 9. Even though the sample size is so large (2000 pairs) why is the 95% prediction interval so wide and therefore practically useless?