In this problem, we explore the use of deterministic annealing for pattern classification using a neural network (Miller et al., 1996). The output of neuron j in the output layer is denoted by Fj(x), where x is the input vector. The classification decision is based on the maximum discriminant Fj(x)
(a) For a probabilistic objective function, consider the expression