Q1. A Hidden Markov Model in discrete time is a stochastic process involving two layers: (i) a hidden state which forms a Markov chain, and (ii) an observed state, which, given the hidden state, is independently distributed across time. The following is an example:
The hidden states Z1, Z2, . . . follow a two state Markov chain (with states 0 and 1) with transition probability matrix
where π00 + π01 = 1 and π10 + π11 = 1. This means, for each i,
P(Zi+1 = l|Zi = k) = πkl, for k, l ∈ {0, 1}.
Given, Zi = k, we observe Xi ∼ N(µk, σ2k), for k = 0, 1, independently for each i. Here µ0, µ1 are real numbers and σ0, σ1 > 0.
(a) Write an R function that takes the following input: π00, π10, µ0, µ1, σ0, σ1, n and Z0 (the initial state of Z), to generate the processes (X1, . . . , Xn) and (Z1, . . . , Zn). Set default values of all the parameters, in your function definition. The function should return all the parameters, and two vectors containing the values of Xi's and Zi's.
(b) Generate a data set with n = 200, π00 = 0.8, π10 = 0.2, µ0 = 0, µ1 = 2, σ0 = σ1 = 1, and Z0 = 0.
(c) Generate a data set with n = 200, π00 = 0.8, π10 = 0.2, µ0 = µ1 = 0, σ0 = 1, σ1 = 4, and Z0 = 0.
(d) Assuming that you only observe the Xi's, for each of (b) and (c), compute the following statistics: (i) Ui = j=1∑i Xj, i = 1, . . . , n; and (ii) Vi = j=1∑i X2j - i(X-i)2, i = 1, . . . , n, where X-i = (j=1∑i Xj)/i.
(e) Based on your results in (d), can you say which of the statistics (among (i) and (ii)) provides information about the transitions in the hidden states {Z1, . . . , Zn}? Justify your answer. [Hint: It will help to plot the values of Ui's and Vi's against index i.]
Q2. Suppose we are given a set of bivariate observations (Xi, Zi) for i = 1, . . . , n, where Zi's are categorical, that is, Zi's can take a finite number of distinct values. We want to see whether the values of Xi's differ for the different categories of the Z variable. In the following, we assume that we have two different vectors X = (X1, . . . , Xn) and Z = (Z1, . . . , Zn) as our data.
(a) Write an R function that accepts X and Z as input and returns the following output:
(i) The set of distinct values of the variable Z;
(ii) The sets of indices for which Z takes these values;
(ii) Mean and standard deviation of X corresponding to the distinct values of Z.
You must avoid using for or while loops when writing this function.
(b) Now suppose we consider the data set generated in 1(b). Treat both X = (X1, . . . , Xn) and Z = (Z1, . . . , Zn) as observed. Use the function created in 2(a) to check whether the Xi's corresponding to different values of Z are from the same population. Provide a brief argument to justify your answer. Clearly state any assumption you are making and any result that you are using.
(c) Write an R function to repeat the process in 2(b) m times (i.e., generate the data following 1(b) m times, and in each case decide whether the populations for distinct Zi's are the same). Report the success rate of your decision rule for m = 100.