Local minima - sigmoid units:
Alternatively in addition to getting over some local minima where the gradient is constant in one direction or adding momentum will increase the size of the weight change after each epoch there the network may converge quicker. Remember there by noticing that it is possible to have cases when (a) the momentum is not enough to carry the search out of a local minima and (b) the momentum carries the search out of the global minima into a local minima. However this is why this technique is a heuristic method and should be utilising somewhat carefully as it is used in practice a great deal.