1. Rao-Blackwell Theorem
Assume that θˆ(X) is some estimate of θ, and let T(X) be a sufficient statistic.
(a) Define θ˜(X) = Ehθˆ(X)|T(X)i
Prove that
Varhθˆ(X)I ≥ Varhθ˜(X)i
with equality if and only if θˆ(X) is already function of T(X) only (in which case θ˜(X) = θˆ(X)).
(b) Prove that, consequently, MSEhθˆ(X)i ≥ MSEhθ˜(X)i.
(c) What is wrong with the following argument:
"Let S(X) be another statistic that is not sufficient. Define
θ?(X) = Ehθˆ(X)|S(X)i
By the same proof as in part (a), θ? an estimator of θ whose variance is no larger than θˆ.
2. Bayesian Inference with the Pareto
Assume the likelihood
X1,...,Xn|Θ = θi.i.d.∼ θx - (θ+1), θ > 0, x > 1
We showed on the midterm that the MLE for the Pareto is θ^ = ((1/n)∑log(Xi))-1.
(a) Consider the Gamma prior on Θ:
Θ ∼ Gamma(k - kβ) = (β/Γ(k))θk-1e-βθ
In this parameterization, the expectation of the Gamma is kβ-1.
Show that this is a conjugate prior for the Pareto and give the posterior of Θ given X.
(b) Compare the posterior mean θˆpost to the MLE θˆMLE. Interpret the prior in terms of fictitious data points.
3. Inverse Gaussian Distribution
Assume that
X1, . . . , Xn ∼i.i.d IG(θ) = (1/√(2πx3))e-(θx-1)^2/2x, θ > 0, x > 0
[Note: Despite the name, X being IG doesn't imply X-1 is Gaussian]
(a) Write the (univariate) inverse Gaussian in exponential family form. Write down a real-valued function of X1, . . . ,Xn that summarizes all the information about θ contained in the data set.
(b) Propose a uniformly most powerful test for:
H0: θ = θ0
H1: θ > θ0
You don't need to give an explicit rejection region, but give the explicit form of your test statistic, as well as a concrete algorithm for how you can compute the cutoff(s). You may assume you have access to a random number generator that gives you as many independent draws as you want from an Inverse Gaussian distribution.
(c) Give the Score and Wald tests for:
H0: θ = θ0
H1: θ6 = θ0
We showed on the midterm that I(θ) = θ-1 and θˆ = X-1.
(d) Invert the Wald test to obtain a Wald CI for θ0. Is your CI exact or asymptotic?
4. Light-bulb Survival Times
It has been claimed by some that the time it takes a new light-bulb to burn out is an Exponential random variable.
Suppose that we want to experiment with changing the filament material (maybe the new material is cheaper, or believed to be higher quality, or both). We make n light-bulbs using the old material and n using the new material, and then we measure how long it takes each bulb to burn out.
Let X1, . . . , Xn ∼i.i.d. Exp(λ1) = (1/ λ1)e-x/λ_1 be the life spans of the old-style bulbs (EX = λ1), and i.i.d. Y1,...,Yn ∼ Exp(λ2) be those of the new-style bulbs, with λ1, λ2 > 0. λ1 and λ2 are both unknown.
You may use the fact that if Z1,...,Zm∼ Exp(λ), then the MLE is λ
(a) Suppose we want to discover whether the new filament changes the average lifespan; that is, we want to test:
H0: λ1 = λ2 (both unknown, but constrained to be the same) H1: λ16 = λ2
Note that here both the null and the alternative are composite.
Give the generalized likelihood ratio statistic for this testing problem and say what its asymptotic null distribution is. What is the rejection threshold in terms of a chi-squared quantile?
(b) If X ∼ Exp(λ) and c > 0, then cX ∼ Exp(cλ). Use this fact to generalize your test from part (a) so that we can test:
H0: λ2/λ1 = ρ0
H1: λ2/λ16 = ρ0
(Here, ρ0 is some fixed candidate value for the ratio. If ρ0 = 1 then we're back to the hypotheses of part (a).)
(c) Use the test from part (b) to obtain an asymptotic confidence interval for the ratio ρ = λ2/λ1. It can be of the form {ρ: condition on ρ, X, Y} but say precisely what the condition is.
(d) Suppose that upon looking at the data, we don't think that the Exponential distribution is a good fit: the data are right-skewed like an Exponential, but the shape doesn't look right. Of the six two-sample tests we saw in class, which is the most appropriate to apply in this setting and why?