This assignment involves the so-called change-point problem and a Bayesian analysis of it.
1. The simple change-point problem can be described as follows. Here it is assumed that both p1(y) and p2(y) are known completely.
- y1, . . . , yτ|τ are independently identically distributed (iid) with distribution p1(y)
yτ+1, . . . , yn|τ, are iid as p2(y)
and τ takes values 1, . . . , n - 1
- If τ = 0 it is assumed that
y1, . . . , yn|τ = 0, are iid p2(y)
- If τ = n it is assumed that
y1, . . . , yn|τ = n, are iid p1(y)
The case τ = n corresponds to "no-change" and τ < n to "change".
(a) Find an expression for the posterior distribution of change-point for this simple model assuming the values τ = 0, 1, 2, . . . , n are allowed.
(b) Find the posterior distribution of the change-point for the British annual coal mining disasters data set for 1851 until 1962, available in the file coalminedata. R.
Assume that the distribution before the change is Poisson with mean 3.1 and after it is Poisson with mean 1.95.
Find the posterior distribution of the change-point and the mode of this distribution. What is an approximate 95% credible interval for τ? What is the posterior probability of "no-change"?
See the papers by Carlin et al, 1992, Hierarchical Bayesian Analysis of Change point Problems.
Applied Statistics.
Jarrett, 1979, A Note on the Intervals Between Coal-Mining Disasters. Biometrika.
2. (a) This question asks you to develop full conditional distributions for the Bayesian change-point problem which involves a change in mean but not variance of normally distributed data. Following on from Question 1, now take p1(y) to be the normal density with mean µ1 and precision (reciprocal variance) γ and p2(y) to be the normal density with mean µ2 and precision (reciprocal variance) γ. Note the two distributions have the same precisions. The values of the parameters µ1, µ2, γ are all assumed unknown.
For τ taking the values 1, . . . , n - 1 (i.e. at least one observation from each of p1 and p2) find the likelihood p(y|µ1, µ2, γ, τ ) simplified to provide a computationally efficient formula as a function of the parameters µ1, µ2, γ, τ . Assuming uniform uninformative priors for the parameters µ1, µ2, log(γ), τ , that is
p(µ1, µ2, γ, τ ) ∝ 1/γ, -∞ < µ1 < ∞, -∞ < µ2 < ∞, 0 < γ, τ ∈ {1, . . . , n - 1}
find the four full conditional posterior distributions:
p(µ1|rest), p(µ2|rest), p(γ|rest), p(τ|rest),
where "rest" means all the other parameters and the data y.
Describe a Gibbs sampling algorithm for generating the posterior distributions of the four unknown parameters.
(b) The data to be analysed involve a sequence of so-called temperature anomalies for North Russia at 20 year intervals, 1001, 1021, ..., until recently. Data is also available for various other sites in the world from about 800 AD until recently. The source is an IPCC report. Jansen E, J Overpeck, KR Briffa, J-C Duplessy, F Joos, V Masson-Delmotte, D Olago, B OttoBliesner, WR Peltier, S Rahmstorf, R Ramesh, D Raynaud, D Rind, O Solomina, R Villalba and D Zhang (2007) Palaeoclimate. In Climate change 2007: the physical science basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, Solomon S, D Qin, M Manning, Z Chen, M Marquis, KB Averyt, M Tignor and HL Miller (eds.). Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.
The data are found in the file nrussia. R
Develop a Gibbs Sampling algorithm to find the posterior distribution of the change point using the model developed in Question 2(a). Report a 95% credible interval for the change-point and for the two parameters µ1, µ2.
Comment on whether the change-point model seems a reasonable model for these data.
3. (a) Suppose y1, . . . yn given θ are independent Poisson(θ) data so that the likelihood is
p(y|θ) = e-nθθs/j=1∏nyj! with s = j=1Σn yj.
The marginal likelihood (or evidence) is given by
p(y) = ∫θp(y|θ)p(θ)dθ. (1)
Assuming that the prior for θ is given by a Gamma(α, β) distribution, show that the marginal likelihood, equation (1), is given by
p(y) = (1/j=1∏nyj!) (βα/Γ(α))(Γ(α + s)/(n + β)α+s) where s = j=1Σnyj.
Show that the same result for p(y) is found by using the identity
p(y) = p(y|θ)p(θ)/p(θ|y).
(b) For two models Mj, j = 1, 2, we can compute the posterior odds of model M1 to M2 as
p(M1|y)/p(M2|y) = (p(y|M1)/p(y|M2))(p(M1)/p(M2)).
For Poisson data with mean θ, we want to compare M1: θ = θ0, with the value of θ0 known, with M2: 0 < θ < ∞ with θ having prior Gamma(α, β).
Here- p(y|Mj ) = ∫p(y|θj, Mj)p(θj|Mj) dθj j = 1, 2.
That is, (1) computed for Mj, j = 1, 2.
Assuming p(M1) = p(M2), find the posterior odds p(M1|y)/p(M2|y).
Assuming θ0 = 1, compute this for s = n and s = 2n for n = 10(10)1000. Comment.
(c) For data y1, . . . yn, assume that the model with likelihood
p(y1, . . . , yn|θ1, θ2) = j=1∏tPoisson(yj; θ1) × j=t+1∏nPoisson(yj; θ2)
and prior
p(θ1, θ2) = Gamma(θ1; α1, β1) × Gamma(θ2; α2, β2)
holds.
Describe in words what situations this probability model might represent.
Show that the marginal likelihood for this model is given by
(1/j=1∏nyj!) x (β1α1/Γ(α1))(Γ(α1 + st)/(t + β1)α1+st) × (β2α2/Γ(α2))(Γ(α2 + s′t)/(n - t + β2)α2+s′t) (2)
using the results of Question 3(a) where st = j=1Σt yj and s′t = j=t+1Σnyj.
How can this expression, (2), be used to make inferences for the value of t if it is unknown (t = 1, . . . , n - 1).
Attachment:- russia and coalmine data.rar