Overfitted the data:
Moreover notice that as time permitting it is worth giving the training algorithm the benefit of the doubt as more as possible. However that is, the error in the validation set can also go by local minima then it is not wise to stop training as soon as the validation set error switch on to increase, as a better minima may be achieved later on. Obviously, if the minima is never bettered so the network that is finally presented by the learning algorithm should be re-wound to be the one that produced the minimum on the validation set.
Alarmingly the another way around overfitting is to decrease each weight by a small weight decay factor during each epoch. So learned networks with large as positive or negative weights would be tend to have overfitted the data but larger weights are required to accommodate outliers in the data.
Thus keeping the weights low into a weight decay factor may help to steer the network from overfitting.