Problem
1. What is cross-validation? How might we pick the right value of k for k-fold cross validation?
2. How might we know whether we have collected enough data to train a model?
3. Explain why we have training, test, and validation data sets and how they are used effectively?
4. Suppose we want to train a binary classifier where one class is very rare. Give an example of such a problem. How should we train this model? What metrics should we use to measure performance?