Assignment - Bayesian Inference using the Metropolis Algorithm
In the section, you will use the algorithm you coded in the previous part to make inference given real data. The data (when we will call y) is two-dimensional data, where the first column is X1 and the second column is X2. In the example, we will assume that our data follows a multivariate normal distribution with unknown mean vector μ and unknown covariance matrix ∑.
[X1, X2]T ∼ N(μ, ∑)
You are only given 15 data points (rows). Your job will be to make internees on the unknown parameters of the distribution μ and ∑. To do so, we will do some Bayesian data analysis. In Bayesian inference, all unknown parameters are treated as random variables. The goal is to find the distribution of the unknown parameters given the data. Suppose that we have a vector of unknown parameters θ. From Bayes' theorem, we know the following is true:
p(θ|y) = (p(y|θ)p(θ))/p(y) ∝ p(y|θ) p(θ)
In the above expression, p(y|θ) is the likelihood function, which can be interpreted as "how likely did our data come from this distribution given the parameters θ. p(θ) is the prior distribution of the unknown parameters, which can be interpreted as "what do I know about my parameters before doing any inference". p(y) is a normalizing constant which makes p(θ|y) a valid probability distribution. Our Goal is to understand how our data affected our knowledge about the unknown parameters (hence we want to learn p(θ|y)). The problem is that we often do not have p(θ|y) in a nice closed form which we can draw samples from. This is where the Metropolis algorithm comes in. The goal is draw samples from p(θ|y) using the Metropolis algorithm. Before you do anything, here are the specs you need to follow: (1) Under the assumption that our data points are independent and identitcally distributed, the likelihood function of the data can be written as a product over all data points:
p(y|θ) = n=1ΠN p(yn|θ)
(2) We will exploit the fact that p(y) is fuel a normalizing constant and thus will consider that our target distribution is just π(θ) = p(y|θ)p(θ). With this infomration, the task should now be more clear. Warning: this part of the project is not easy, as you will have to do some thinking/researching on your own.
If you need me to point you to other references, please do not hesitate to ask. Here are the specific tasks:
1. Write down the likelihood function for the data you are given. In other words, what is p(y|μ, ∑).
2. You will assume that the prior distribution p(μ, ∑) = p(μ)p(∑). In other words, we assume that before getting any data μ and ∑ are independent. For μ the prior distribution we will choose is multivariate normal distribution and for ∑ we will use the inverse-Wishart distribution (which is a distribution over matrices). Write down both distributions. What are their parameters?
3. After reading a bunch of studies and obtaining prior knowledge, you think the moan parameter μ should be around [-7, 4.5]T. You also think that the covariance matrix should be around [15, 5; 5, 15]. Can you write down some suitable parameters for the prior that will make the expected values of the priors agree with the prior knowledge?
4. Using this information now, write a function called unnormalizsd_posterker.m in MATLAB which takes in values of μ and ∑ and outputs the evaluated values π(μ, ∑).
5. Run the Metropolis algorithm on this distribution. Keep running the loop until you have accepted 5000 samples. In the samples that have been accepted to do any Inference. From these samples, only use the last 4000. Note: You will struggle to choose a suitable random walk so that the acceptance of the samples is high, especially for the random walk of the matrix. Do some research! Ask Google what are proper proposal/candidtate distribution for unknown mean and covariance matrices!
6. Histogram the samples of μ using the hits3 function in MATLAB.
7. Compute the posterior means of both μ and ∑.
Attachment:- Assignment File.rar