The number of occurences of a rare, nongenetic birth effect in a five-year period for six neighboring counties is y = (1, 3, 2, 12, 1, 1). The counties have populations of x = (33, 14, 27, 90, 12, 17), given in thousands. The second county has higher rates of toxic chemicals (PCBs) present in the soil samples, and it is of interest to know if this town has a high disease rate as well. We will use the following hierarchial model to analyze these data:
• Yi|?i , xi ~ Po(?ixi);
• ?1, . . . , ?6|a, b ~ G(a, b);
• a ~ G(1, 1) ; b ~ G(10, 1).
(a) Describe in words what the various components of the hierarchical model represent in terms of the observed and expected disease rates.
(b) Identify the form of the conditional distribution of p(?1, . . . , ?6|a, b, x, D), and from this identify the full conditional distribution of the rate for each county p(?i |?-i, a, b, x, D).
(c) Write out the ratio of the posterior densities comparing a set of proposal values (a*, b*, ?) to values (a, b, ?). Note that the value of ?, the vector of county-specific rates, is unchanged. (d) Construct a Metropolis-Hastings algorithm which generates samples of (a, b, ?) from the posterior. Do this by iterating the following steps:
1. Given the current value (a, b, ?), generate a proposal (a * , b* , ?) by sampling a * and b * from a symmetric proposal distribution centered around a and b, but making sure all proposals are positive. Accept the proposal with the appropriate probability.
2. Sample new values of ?j 's from their full conditional distributions. Perform diagnostic tests on your chain and modify if necessary.
(e) Make posterior inference on the infection rates using the samples from the Markov chain. In particular,
(i) Compute marginal posterior distributions of ?1, . . . , ?6 and compare them to y1/x1, . . . , y6/x6.
(ii) Examine the posterior distribution of a/b, and compare it to the corresponding prior distribution as well as to the average of yi/xi across the six counties.
(iii) Plot samples of ?2 versus ?j for j ? 2, and draw a 45 degree line on the plot as well. Also, estimate P(?2 > ?j |x, D) for each j ?2 and P(?2 = max{?1, . . . , ?6}|x, D). Interpret the results of these calculations, and compare them to the conclusions one might obtain if they just examined yj/xj for each county j.