Problem
1. Explain what precision and recall are. How do they relate to the ROC curve?
2. Is it better to have too many false positives, or too many false negatives? Explain.
3. Explain what overfitting is, and how you would control for it.
4. Suppose f ≤ 1/2 is the fraction of positive elements in a classification. What is the probability p that the monkey should guess positive, as a function of f, in order to maximize the specific evaluation metric below? Report both p and the expected evaluation score the monkey achieves.
(a) Accuracy.
(b) Precision.
(c) Recall.
(d) F-score.