Problem
1. Build an ensemble of estimators, where the base estimator is a decision tree.
(a) How is this ensemble different from an RF?
(b) Using sklearn, produce a bagging classifier that behaves like an RF. What parameters did you have to set up, and how?
2. Consider the relation between an RF, the number of trees it is composed of, and the number of features utilized:
(a) Could you envision a relation between the minimum number of trees needed in an RF and the number of features utilized?
(b) Could the number of trees be too small for the number of features used?
(c) Could the number of trees be too high for the number of observations available?
3. How is out-of-bag accuracy different from stratified k-fold (with shuffling) crossvalidation accuracy?