Learning rate measures the intensity of weight adjustments applied to a network after detecting a faulty operation. The informal interpretation of learning rate is that high values represent a strict and demanding teacher and small values correspond to a gentle and tolerant one. Explain why the best training results are observed for moderate values of learning rate (neither too small nor too high). Try to confirm your explanation by experimenting with our program. Plot learning rate versus training performance after a set number of epochs (after 500, 1000, and 5000 steps).