(a) Write a program to fit a single hidden layer neural network (ten hidden units) via back-propagation and weight decay.
(b) Apply it to 100 observations from the model
where σ is the sigmoid function, Z is standard normal, X = (X1, X2), each Xj being independent standard normal, and a1 = (3, 3), a2 = (3, -3). Generate a test sample of size 1000, and plot the training and test error curves as a function of the number of training epochs, for different values of the weight decay parameter. Discuss the overfitting behavior in each case.
(c) Vary the number of hidden units in the network, from 1 up to 10, and determine the minimum number needed to perform well for this task.