Problem: Assume you implement early stopping with mini-batch gradient descent. Is it a good idea to stop training immediately when the validation error goes up? (Short answers)
The book describes a process called regularization. What problem does regularization seek to avoid, and how does it accomplish this? (Short answers)