Part -1:
Perceptron
1. Write a computer program to classify letters from different fonts using perceptron learning. Use as many output units as you have different letters in your training set. Convert the letters to bipolar form. (You may wish to enter the letters as "2" if the pixel is on and "0" if it is off to facilitate testing with noisy patterns after training; your program should subtract 1 from each component of the input pattern to obtain the bipolar vector.)
a. Repeat Example 2.15 for several values of the threshold 0. After training with each value, test the ability of the net to classify noisy versions of the training patterns. Try 5, 10, 15, 20 pixels wrong and the same levels of missing data. Do higher values of 0 have any effect on how often the net is "confused" ? Do you reach a value of 0 for which the net cannot learn all of the training patterns?
b. Experiment with other letters. Are some combinations harder to learn than others? Why?
2. Code a backpropagation network to store the following patterns. The input patterns are the "letters" given in the 5 x 3 arrays, and the associated target patterns are given below each input pattern;
Experiment with the number of hidden units (no more than 15), the learning rate, and the initial weight values. Compare your results with those from the DAM network project.
Part -2:
Heteroassociative neural net
1. Write a computer program to implement a heteroassociative neural net using the Hebb rule to set the weights (by outer products). The program should read an x vector from a 9 x 7 array, followed by the corresponding y vector from a 5 3 array. Start by using the patterns in Example 3.9. Expand your training set to include more letters, taking the x patterns from one of the fonts in Figure 2.20 (or creating your own) and using the y patterns in Project 3.4 (or creating your own). Explore the number of pattern pairs that can be stored in the net, and the ability of the net to respond to noisy input. You may find it convenient to represent your training patterns by arrays with an entry of "2" if the pixel is on, "0" if it is off. Your program should then subtract 1 from each entry to form the bipolar pattern vector. This approach has the advantage that missing data (for testing) can be entered as a "1" which the program will convert to zero as desired.
Discrete Hopfield net
2. Write a computer program to implement a discrete Hopfield net to store the letters from one of the fonts in Figure 2.20. Investigate the number of patterns that can be stored and recalled correctly, as well as the ability of the net to respond correctly to noisy input.
Part -3
Autoassociative neural net
Write a computer program to implement an autoassociative neural net using the Hebb rule to set the weights (outer products). The program should read input from a 7 x 5 array into an x vector (a 35-tuple) and should have a training mode in which the weights are set (by outer products). The number of inputs may be specified in ad¬vance, but it will be more convenient to prompt the user interactively for more input. The program should also have a test mode in which the weights are not changed but the response of the net is determined. This response should be printed as a 7 x 5 array. It is good practice to display the input array, followed by the output array.
The input patterns to be stored are as follows:
Try to answer the following questions:
1. How many patterns can you store (and recall successfully)?
Does it matter which ones you try to store? (Consider whether any of these input vectors are orthogonal or nearly so; how is that relevant?)
Does it matter whether you use binary or bipolar patterns?
2. How much noise can your net handle?
Does the amount of noise that can be tolerated (i.e., for which the net will still give the original stored pattern as its response) depend on how many patterns are stored?
3. Does the performance of the net improve if it is allowed to iterate, either in the "batch mode" iteration of Section 3.4.1-3 or the "one unit at a time" form of the discrete Hopfield net of Section 3.4.4?
6.2 Code a computer program to implement a backpropagation neural network with one hidden layer. Use a bias on each hidden unit and each output unit; use 10 input units, 4 hidden units, and 2 output units. Use the bipolar sigmoid activation function.
The input patterns are
s(1) = 0 1 2 3 4 5 6 7 8 9
s(2) = 9 8 7 6 5 4 3 2 1 0
s(3) = 0 9 1 8 2 7 3 6 4 5
s(4) = 4 5 6 3 2 7 1 8 0 9
s(5) = 3 8 2 7 1 6 0 5 9 4
s(6) = 1 6 0 7 4 8 3 9 2 5
s(7) = 2 1._ 3 0 4 9 5 8 6 7
s(8) = 9 4 0 5 1 6 2 7 3 8
The corresponding ouput (target) patterns are:
t(1) = -1 -1 t(5) = 1 1
t(2) = 1 1 t(6) 1 -1
t(3) = -1 1 t(7) = -1 1
t(4) = 1 -1 t(8) -1 -1
Use random initial weights distributed on ( - 0.5, 0.5) and a learning rate of (1) 0.05 and (ii) 0.5. For each learning rate, perform 5,000 and 50,000 epochs of training (using the same initial weights in each case).
Part -4
Bidirectional associative memory
Write a computer program to implement a bipolar BAM neural network. Allow for (at least) 15 units in the X-layer and 3 units in the Y-layer.
a. Use the program to store the association given in Example 3.23, i.e.,
A →(-1, 1 )
C → (1, 1).
Try to illustrate the same cases as discussed in the text. Try some other cases of your own design. You might test the net on some noisy version of the letter C, for instance.
b. Use your program to store the following patterns (the X-layer vectors are the "letters" given in the 5 x 3 arrays; the associated Y-layer vectors are given below each X pattern):
Is it possible to store all eight patterns at once? If not, how many can be stored at the same time? Try some experiments with noisy data, as in part a.
The following table gives the Hamming distance between the letters denoted by the foregoing patterns:
Determine the Hamming distances between the 11-layer patterns associated with each of these letters. From the ratios of the Hamming distances, try to determine which pattern pairs will be most likely to be stored successfully.
Since the upper limit on the number of arbitrary pattern pairs that can be stored is min n, m) (the number of components in the X- and Y-layer patterns respectively), you should consider carefully any case in which more than that number of patterns are stored. (Test whether the net can respond correctly in both directions.)
When using noisy patterns, the net is not retrained on then; they are used simply for testing the response of the net.
Part -5
Code a computer program to implement a backpropagation neural network with one hidden layer. Use a bias on each hidden unit and each output unit. Use the bipolar sigmoid activation function. For each test case, print the initial weights, final weights, learning rate, number of training epochs, and network response to each input pattern at the end of training. The training data are given in the following table:
BIPOLAR XoR
s(1) (1, -1) t(1) = 1
s(2) = ( - 1, 1) t(2) = 1
s(3) = (1, 1) t(3) = -1
s(4) ( - 1, -1) t(4) = -1
Use initial weights distributed randomly on ( -0.5, 0.5), and a learning rate of (i) 0.05, (ii) 0.25, and (iii) 0.5. For each learning rate, perform 1,000, 10,000, and 25,000 epochs of training (using the same initial weights in each case). Use two input units, two hidden units, and one output unit.
Kohonen self-organizing maps
1. Write a computer program to implement a Kohonen self-organizing map neural net. Use 2 input units, 50 cluster units, and a linear topology for the cluster units. Allow the winner and its nearest topological neighbor to learn. (In other words, if unit J is the winner, then units J - 1 and J + 1 also learn, unless J - 1 < 1 or J + I > 50.)
Use an initial learning rate of 0.5, and gradually reduce it to 0.01 (over 100 epochs). The initial weights on all cluster units are to be random numbers between -1 and 1 (for each component of the weight vector for each unit).
Generate a training data file as follows:
Choose two random numbers between - 0.5 and 0.5, and call them x 1 and x2- Put the point (x1, x2) in the training set if
X12 + X22 < 0.25
Repeat until you have a set of 100 training points.
After every 10 epochs of training, graph the cluster units (by using their weight vector as a position in the two-dimensional Euclidean plane); draw a line connecting
C1 to C2, C2 to C3, etc., to show their topological relationships. You should start with a real mess for the initial positions of the weights, which will gradually smooth out to give a line that winds its way through the region from which the training points were chosen.