1. What do you think of the idea of using a variational method to optimize an approximating distribution Q which we then use as a proposal density for importance sampling?
2. Define the relative entropy or Kullback-Leibler divergence between two probability distributions P and Q, and state Gibbs' inequality.