Problem
Consider learning the parameters of the network H → X, H → Y, where H is a hidden variable. Show that the distribution where P(H), P(X | H), P(Y | H) are uniform is a stationary point of the likelihood (gradient is 0). What does that imply about gradient ascent and EM starting from this point?