Questions -
Q1. Let {Xi : i = 1, 2, . . . , n} be independent Binomial(mi, ρ) random variables with ρ ∈ (0, 1) and mi a known, positive integer for each i. For example, Xi can be the number of employees participating in a pension plan at a firm with mi employees. Then ρ is the participation rate in the entire population of employees.
(i) Let Y = i=1∑n Xi. What are the mean and variance of Y?
(ii) The MGF for the Binomial(m, ρ) distribution can be shown to be [1 - ρ + ρ exp(t)]m. Use this to show that Y has a binomial distribution and specify its two arguments.
(iii) Suppose you want to test H0 : ρ < ρ0 against H1 : ρ > ρ0 for some value p0 ∈ (0, 1). Explain why it will not always be possible to choose a critical value to obtain an exact 5% test. (Hint: A simple example with a small n will do.)
(iv) Explain how to compute the p-value for the test if y is the observed value of the test statistic. (Of course you also know m = i=1∑nmi.)
(v) Use the Stata function binomialtail (. , . , .) to compute the p-value when ρ0 = 0.5, m = 600, and y = 324. Would you reject the null at the 5% level? At the 1% level?
Q2. Let {Xi : i - 1, . . . , n} be IID draws from the Exponential(μ) distribution for μ > 0.
(i) Let Y = X1 + X2 + · · · Xn. What is the distribution of Y?
(ii) Conclude that W = Y/ μ ∼ Gamma(n, 1). What are E(W) and Var(W)?
(iii) Suppose you want to test H0: μ < μ0 against H1: μ > μ0, for some μ0 > 0. How would you use the statistic T = Y/ μ0 to obtain a test with size 5% when μ = μ0? Find the critical value when n = 24 and report it to three decimal places. [Hint: The invgammap in Stata will be useful.]
(iv) Argue that μ = μ0 is the least favorable case under H0.
(v) Instead of finding a critical value, how would you find the p-value if the outcome of T is t? Report the p-value to three decimal places if n = 24 and t = 31.4.
Q3. Let {Xni : i = 1, 2, . . . ,n} be a sequence of independent, identically distributed Exponential(μn) random variables. Therefore, the density of each Xni is
fn(x) = (1/μn) exp(-x/μn), x > 0.
Assume specifically that μn = 2 + δ/√n where δ ≥ 0. Let X-n be the sample average of {Xni : i = 1, . . . , n}.
(i) Find E(X-n) and Var(X-n) in terms of δ and n.
(ii) Show that X-n →p 2.
(iii) Define
Tn1 = √n(X-n - 2)/2
Tn2 = √n(X-n - 2)/X-n
What is the asymptotic distribution of Tn1 when δ = 0? What about Tn2? Justify your answers.
(iv) Explain how you would test H0 : μ < 2 against H1 : μ > 2 using both of the statistics.
Q4. Use the data in EARNS2.DTA, which is available on the D2L site, to answer the following questions. These data are on workers at a data entry company, where they are mainly paid based on output.
(i) Use the command sum earns 0 5 earns 0 6. How many workers are there in the sample? Did average earnings go up in 2006 compared with 2005? What is the percentage change in average earnings?
(ii) At the beginning of 2006 the company invested in faster computers with the hope that the workers could become more productive (finish more jobs in the same amount of time). The variable cearns is the change in earnings for each worker: cearnsi = earns06i - earns05i. Use the command count if cearns < 0 to find the number of workers whose earnings actually fell.
(iii) Let μ = E(cearns) be the expected change in earnings (defined for a large population of data entry workers). Report the p-value for testing H0 : μ ≤ 0 against H1 : μ > 0. Do you reject H0 at the 5% significance level? What about at the 1% level? What are the degrees of freedom in the t distribution?
(iv) Provide an economic reason and a statistical reason for why this analysis might not be convincing.
Q5. Provide an answer to each of the following questions.
(i) State whether you agree or disagree with the following statement, and provide justification: "If we base a statistical test on an unbiased estimator, the test is also unbiased."
(ii) Suppose that you believe you are sampling from an Exponential(θ) distribution with E(X) = θ. Consider four test statistics for testing H0 : θ ≤ θ0 against H1 : θ > θ0 for some θ0 > θ:
T1 = √n(X- - θ0)/X-
T2 = √n(X- - θ0)/S
T3 = √n(X- - θ0)/θ0
T4 = √n(X- - θ0)/√(n-1 i=1Σn(Xi-θ0)2)
where X- is the sample average and S is the sample standard deviation. Which of these statistics is robust to failure of the Exponential distribution? Justify your answer.