Question 1-
Derive the exponential form of the Boltzmann factor in the following way. Consider an isolated set of M + N independent magnets, each of which can be in an si = +1 or si = -1 state. There is a uniform magnetic field applied; this means that the energy of the si = +I state has some positive energy, which we can arbitrarily set to 1; the si = -1 state has energy -1. The total energy of the system is therefore the sum of the number pointing up, ku, minus the number pointing down, kd; that is, ET = ku - kd. (Of course, ku + kd = M + N regardless of the total energy.)
The fundamental statistical assumptions describing this system are that the magnets are independent and that the probability a subsystem (that is, the N magnets), has a particular energy is proportional to the number of configurations that have this energy.
(a) Consider the subsystem of N magnets, which has energy EN. Write an expression for the number of configurations K (N, EN) that have energy EN.
(b) As in part (a), write a general expression for the number of configurations in the subsystem M magnets at energy EM, that is, K(M, EM).
(c) Because the two subsystems consist of independent magnets, the total number of ways the full system can have total energy ET = EN + EM is the product K(N, EN)K(M, EM). Write an analytic expression for this total number.
(d) In statistical physics, if M >> N, the M-magnet subsystem is called the heat reservoir or heat bath. Assume that M >> N, and write a series expansion for your answer to part (c).
(e) Use your answer in part (d) to show that the probability the N-unit system has energy EN has the form of a Boltzmann factor, e-EN.
Question 2-
Write a backpropagation program for a 2-2-I network with bias to solve the XOR problem (cf. Fig. 6.1).
(a) Show the input-to-hidden weights and analyze the function of each hidden unit.
(b) Plot the representation of each pattern as well as the final decision boundary in the y1y2-space.
(c) Although it was not used as a training pattern, show the representation of x = 0 in your y1y2-space.
The fig 6.1 for the above question-
FIGURE 6.1. The two-bit parity or exclusive-OR problem can be solved by a three-layer network. At the bottom is the two-dimensional feature xi x2-space, along with the four patterns to be classified. The three-layer network is shown in the middle. The input units are linear and merely distribute their feature values through multiplicative weights to the hidden units. The hidden and output units here are linear threshold units, each of which forms the linear sum of its inputs times their associated weight to yield net, and emits a +1 if this net is greater than or equal to 0, and -1 otherwise, as shown by the graphs. Positive or "excitatory" weights are denoted by solid lines, negative or "inhibitory" weights by dashed lines; each weight magnitude is indicated by the line's thickness, and is labeled. The single output unit sums the weighted signals from the hidden units and bias to form its net, and emits a +1 if its net is greater than or equal to 0 and emits a -1 otherwise. Within each unit we show a graph of its input-output or activation function-f(net) versus net. This function is linear for the input units, a constant for the bias, and a step or sign function elsewhere. We say that this network has a 2-2-1 fully connected topology, describing the number of units (other than the bias) in successive layers.
Question 3-
Write a basic backpropagation program for a 3-3-I network with bias to solve the three-bit parity problem where each input has value ±1. That is, the output should be +1 if the number of inputs that have value +1 is even, and should be -1 if the number of such inputs is odd.
(a) Show the input-to-hidden weight and analyze the faction of each hidden unit.
(b) Retrain several times from new random points until you get a local (but not global) minimum. Analyze the function of the hidden units now.
(c) How many patterns are properly classified for your local minima? Explain.
Question 4-
Train a Boltzmann network consisting of eight input units and ten category units with the characters of a seven-segment display shown in Fig. 7.10.
(a) Use the network to classify each of the ten patterns, and thus verify that all have been learned.
(b) Explore pattern completion in your network the following way. For each of the 28 possible patterns do pattern completion for several characters. Add hidden units and show that better performance results for ambiguous characters
Correction in the above question-
The network has 7 input instead of 8.
In (b) there would 27 possible instead of 28.
FIGURE 7.10. A Boltzmann network can be used for pattern completion-that is, filling in unknown features of a deficient pattern. Here, a 12-unit network with five hidden units has been trained with the 10 numeral patterns of a seven-segment digital display. The diagram at the lower left shows the correspondence between the display segments and nodes of the network. Along the top, a black segment is represented by a + 1, and a light gray segment is represented as a - 1. Consider the deficient pattern consisting of s2 = -1, s5 = +1, but with the other five inputs (shown as dotted lines in the pattern) unspecified. If these units are clamped and the full network is annealed, the remaining five visible units will assume values most probable given the clamped ones, as shown at the right.