Question 1. During the lectures, we have discussed three things that can be called a "mean". Explain carefully what the differences are between µ, X¯, and x¯.
Question 2. Three sub-questions worth three, three, and four points, respectively;
For this question, assume that we have a data set with n data points x1, x2,...,xn. They have sample mean x¯ and sample variance s2x. We have also calculated y1, y2,...,yn according to the formula yi =7 - 3xi. The sample mean and variance of this new variable are called y¯ and s2y, respectively. Use the definitions of sample means and variances (in summation notation) to prove the following statements.
a. (1/n) i=1Σn (xi - x¯)=0.
b. y¯ = 7 - 3¯x.
c. s2y = 9s2x.
Question 3.
Suppose that we have surveyed 41 students who commenced the B.Ec. program at the University of Sydney in 2015. We found that they have an average ATAR of 94.37 and a standard deviation of 1.34.
a. Construct a 95% confidence interval for the population mean.
b. Explain carefully how your answer to question 3.a. should be interpreted.
c. How would your answer to question 3.a. change if the sample mean would be higher?
d. How would your answer to question 3.a. change if the sample standard deviation would be higher?
e. How would your answer to question 3.a. change if the confidence level would be higher?
Question 4.
The data set salaries.dta contains data on weekly earnings for a random sample of 33 people, working in either Sydney (coded by city==1) or Melbourne (city==2). We are interested in testing whether the mean salary is equal in both cities.
a. Describe carefully what a Type I error would be for this particular test.
b. Describe carefully what a Type II error would be for this particular test.
c. Perform the test. (This can be done in Stata using the ttest command with the by option. Alternatively, this is called a "two-group mean-comparison test" in the drop-down menu.) What do you conclude, at the 5% significance level?
d. Repeat the test from question 4.c., this time using the logarithms of the salaries rather than the salaries themselves. What do you conclude in this case?
e. Which of the tests in questions 4.c. and 4.d. is preferable from a statistical point of view? Why?
Question 5.
The data set waiting.dta contains data on the amount of time that a random sample of callers had to wait before their call was answered at a certain call center. All of the data were obtained last week. For each call, the day of the week is also included.
a. The manager of this call center claims that on average, callers wait for five minutes before being answered. Show that this hypothesis cannot be rejected, at 5% significance.
b. Despite the non-rejection in question 5.a., we still see that the sample mean is more than five minutes. Perform a one-sided test to see whether the population mean is more than five minutes.
c. Statistically speaking, what is wrong with the procedure we followed in question 5.b.?
d. A technological failure occurred last Wednesday, putting part of the call center offline for most of the day. Therefore, we decide to exclude Wednesday's data from consideration. Do so, and test the manager's claim again.
e. If you were this manager's supervisor, what would you tell him, based on the result you found in question 5.d.?