1)  Consider again the churn dataset. Create two learning curves (using  WEKA) of the out of sample AUC on the test set (churn_test.arfff) using  both logistic regression and the decision tree J48 (just go with the  default settings). In particular, starting from the full training set,  after each iteration, reduce the training set to half until you reach  less than 100 examples. Provide a plot with both curves (copy the data  into EXCEL and create the charts) .
• You  can cut the dataset in half easily in Weka. In the Preprocess tab, in  the box marked Filter, click on Choose. Under  weka->filters->unsupervised->instance you will see  RemovePercentage. (Normally, it is a good idea first to run the filter  Randomize, to make sure that you are removing the data randomly; real  data often will be sorted based on some attribute, which can result in  throwing away many data items with similar values. Don't Randomize for  this assignment; the data for this assignment already will be  randomized.)
• The  Undo button on the preprocess tab will undo the preprocessing (like  Randomizing, RemovePercentage, etc.). Keep an eye on the data statistics  (like the number of instances) in the preprocess tab to verify.
2)  Create a fitting curve of the generalization AUC for decision trees as a  function of the MinNumObj parameter. First change the option ‘unpruned'  to ‘true'. Provide a plot of the parameter and the resulting out of  sample performance using either cross validation or a training/test  split. What does the parameter do? What is the optimal selection for the  parameter?
3)  Repeat the same experiment as in step 1, but setting minnumObj=100 and  unpruned=TRUE. How does the learning curve of the decision tree change?  What do you infer from this result?
Attachment:- Assignment.rar