Honors Exam 2013: Statistics
1. Two i.i.d. Normal observations are made, y1, y2 ∼ N (µ, 1), with µ unknown. "Student", in his paper introducing the t-distribution 105 years ago, states:
"If two observations have been made and we have no other information, it is an even chance that the mean of the (Normal) population will lie between them."
(a) Verify Student's claim, showing how to interpret it as providing a 50% (frequentist) confidence interval for µ.
(b) Compute the expected length of the interval from (a) exactly (simplify). How does the average length compare (=, <, or >) to that of the usual 50% confidence interval, (¯y - z0.75/√2, y¯+ z0.75/√2), where z0.75 ≈ 0.67 is the 0.75 Normal quantile?
2. You have k independent unbiased estimators of an unknown parameter θ, where the jth, denoted θˆj, has mean θ and known variance Vj > 0. Consider linear combinations of the θˆj, with the constraint that the weights assigned to these estimators be such that the resulting combination is unbiased.
(a) Find the best constants, in the sense of minimizing the mean squared error.
(b) Explain intuitively why your answer to (a) makes sense, in terms of Fisher information and/or an example.
3. Let y = (y1, . . . , yk) be data and µ = (µ1, . . . , µk) be parameters, connected through the Normal hierarchical model where, for i = 1, 2, . . . , k,
yi|µi ∼ ind. N (µi, Vi)
µi ∼i.i.d. N (µ0, A).
The variances Vi and the hyperparameters µ0 and A are known constants.
(a) The parameters µi are independent a priori (i.e., before observing the data). Are they also independent a posteriori (i.e., after observing the data)? You can give either a mathematical proof or a convincing intuitive argument.
(b) Find the posterior distribution of µi given the data y.
4. A widget-making company wants to study the reliability of their supposedly water-resistant widgets. The survival time of a widget that gets wet is defined as the length of time from when the widget gets wet until it stops working. Suppose that such survival times are i.i.d. Exponential r.v.s, with rate parameter λ and mean µ = 1/λ, with µ measured in days.
The CEO hires you as a consultant, and hands you a data set (t1, . . . , t10) of survival times (in days) of 10 widgets that got wet. The following conversation ensues.
CEO: "We need some number-crunching help. Can you analyze our data?"
You: "Sure, but before we get to the data analysis, we should clarify some key issues.
First of all, what is your scientific goal?"
CEO: "The goal is to figure out the average survival time of a widget that gets wet. Now can you analyze our data?"
You: "It is essential for me to know more about how the data were sampled. Can you tell me precisely what the data-collection process was?"
CEO: "A technician poured water on some widgets and then measured their survival times. Now can you analyze our data?"
You: "So there were 10 working widgets to start with, which all got wet?"
CEO: "I don't see why that matters, but there may have been more than 10 initially.
The technician poured water on some widgets on a Friday at noon, and then went away for the weekend, returning on the following Monday at noon. While he was gone, some of the widgets may have stopped working and accidentally been disposed of by someone else. The technician forgot to record how many widgets he had initially, but I gave you the survival times for all the widgets that were present when he returned. Now can you analyze our data?"
You: "The statistician R.A. Fisher once said, 'To consult a statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.' But I will try."
(a) Find the likelihood function for λ. Note that any widget with a survival time t < 3 would have been discarded without you even knowing of its existence; the data you have are conditioned on having values of at least 3 (this is called truncated data).
(b) Find the MLEs of µ and of λ, and give a simple explanation in words for how and why the MLE of µ differs from the sample mean of (t1, . . . , t10).
A follow-up experiment is performed, this time with you involved from the start. You get 30 widgets wet, and carefully monitor them. But 7 days after you start the experiment, the CEO gets impatient and demands immediate results. At this point in time, 21 widgets have stopped working, and you have recorded their survival times, but for the other 9 widgets, you know only that their survival times will be at least 7 days (the survival times for these 9 widgets are said to have been censored).
(c) Find the MLEs of µ and of λ (just based on the data from the follow-up experiment), and give a simple explanation in words for how and why the MLE of µ differs from the sample mean of the 21 observed survival times.
Hint: a widget's contribution to the likelihood function for λ is the PDF evaluated at t if the widget was observed to have stopped working at time t, and is the probability of still being working after 7 days if its survival time was censored.