Honors Exam 2012: Statistics
1. A basketball coach wants to devise an innovative way to test whether her players have improved by practicing over the summer. The previous season, her star, Julie, made 70% of her free throws. Julie claims to have improved over the summer; she now thinks she is an 80% free throw shooter. The coach doubts Julie has improved; she asks Julie to start shooting free throws. Let X be a random variable corresponding to the number of consecutive free throws Julie makes before her first miss. If we assume Julie's free throws are independent and that she makes each shot with some fixed, unknown probability p, the geometric distribution would seem appropriate:
P (X = x) = px(1 - p), for x = 0, 1, 2, . . .
The following table presents these probabilities for two values of p, 0.8 and 0.7. The last row gives the probabilities of making 20 or more shots before the first miss.
x
|
P(X = x/p = 0.8)
|
P(X = x/p = 0.7)
|
0
|
0.2000
|
0.3000
|
1
|
0.1600
|
0.2100
|
2
|
0.1280
|
0.1470
|
3
|
0.1024
|
0.1029
|
4
|
0.0819
|
0.0720
|
5
|
0.0655
|
0.0504
|
6
|
0.0524
|
0.0353
|
7
|
0.0419
|
0.0247
|
8
|
0.0336
|
0.0173
|
9
|
0.0268
|
0.0121
|
10
|
0.0215
|
0.0085
|
11
|
0.0172
|
0.0059
|
12
|
0.0137
|
0.0042
|
13
|
0.0110
|
0.0029
|
14
|
0.0088
|
0.0020
|
15
|
0.0070
|
0.0014
|
16
|
0.0056
|
0.0010
|
17
|
0.0045
|
0.0007
|
18
|
0.0036
|
0.0005
|
19
|
0.0029
|
0.0003
|
≥ 20
|
0.0115
|
0.0009
|
a. Consider testing H0: p = 0.7 versus Ha: p > 0.7. Derive the rejection region for the test corresponding to significance level α = 0.05.
b. What is the probability of a Type I error? Explain this concept in the context of this problem to Julie's coach, who has never studied statistics.
c. What is the power of this test against Julie's proposal that she improved and is actually an 80% free throw shooter? Again, explain this concept in the context of this problem to Julie's coach, who has never studied statistics.
2. Suppose {Xi} is a set of n ≥ 1 independent identically distributed (iid) random variables from the uniform distribution on the interval (0, θ) for 0 < θ < ∞.
a. Find the maximum likelihood estimator (MLE) for θ.
b. Prove that the MLE is a biased estimator for θ.
c. Prove that the MLE is a consistent estimator for θ.
d. Propose an unbiased estimator for θ. Is this estimator preferable to the MLE? Include a discussion of the basis for your preference, as if you were presenting this solution to a fellow student of mathematical statistics.
3. Suppose X1, X2, . . . are iid Bernoulli random variables such that P (Xi = 1) = p, with p some fixed value in (0, 1). Define L1 and L2 to be the lengths of the first and second "runs," respectively, in the sequence generated by the Xi's. A "run" is a collection of consecutive common outcomes. So, for example, the sequence 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, . . . would produce L1 = 3, L2 = 2, L3 = 1, and L4 = 5.
a. What is the distribution of L1?
b. What is the distribution of L2?
c. Are L1 and L2 independent? Discuss.
4. A coin will be tossed once, and we are interested in estimating the probability of heads, p. A Bayesian will propose using a uniform prior distribution on p,
g(p) = 1, 0 ≤ p ≤ 1,
and will estimate p using the mean of the posterior distribution. A frequentist will use the maximum likelihood estimator (MLE) for p. Show how to derive both the MLE and the Bayes estimator. Using squared error loss, describe (exactly or approximately, with justification) the range of values of p for which the MLE is preferable to the Bayes estimator.
5. This problem examines data showing the effect of two soporific (sleep-inducing) drugs. The variable extra shows the increase in hours of sleep for each of n = 20 patients who had been randomly assigned to two groups for the study. A few observations are shown here:
|
extra
|
group
|
1
|
3.4
|
0
|
2
|
0.8
|
0
|
3
|
1.9
|
1
|
4
|
4.4
|
1
|
5
|
-1.2
|
0
|
There are n0 = n1 = 10 subjects in each group. In terms of underlying probability models, you may assume EXTRAi ∼ N (µ0, σ2) for subject i in group 0, and EXTRAj ∼ N (µ1, σ2) for subject j in group 1. The sample variance of all 20 measurements is
s2total = 1/n - 1 i=1Σn(extrai - (extra)-)2 = 4.0720,
where the overall mean is
(extra)- = 1/ni=1Σn extrai = 1.5400.
The sample variance of measurements in group 0 is s02 =3.2006, and the variance of measurements in group 1 is s12 =4.009. A pooled 2-sample t-test is performed, making use of the pooled estimate of the variance,
s2pooled = ((n0 - 1) ∗ s02 + (n1 - 1) ∗ s12/n0+n1-2) = 3.6048,
and giving the following result:
a. Show how to obtain the confidence interval (-3.363874, 0.203874) from the information provided, above. Clearly define any quantities used.
b. Critique the following statement: The 95% confidence interval (-3.363874, 0.203874) contains the true difference in drug effectiveness with probability 0.95.
Attachment:- Assignment.rar