Problem
Telecommunications companies providing cell-phone service are interested in customer retention. In particular, identifying customers who are about to churn (cancel their service) is potentially worth millions of dollars if the company can proactively address the reason that customer is considering cancellation and retain the customer. The DATAfile Cellphone contains customer data to be used to classify a customer as a churner or not. Using XLMiner's Partition with Oversampling procedure, partition the data with all the variables so there is 50% successes (churners) in the training set and 40% of the validation data are taken away as test set. Use 12345 as the seed in the randomized sampling. Fit a single classification tree using Churn as the output variable and all the other variables as input variables. In Step 2 of XLMiner's Classification Tree procedure, be sure to Normalize Input Data and to set the Minimum # records in a terminal node to
1. Generate the Full tree, Best pruned tree, and Minimum error tree.
A: Why is partitioning with oversampling advised in this case?
B: From the CT_Output worksheet, what is the overall error rate of the full tree on the training set? If required, round your answer to two decimal places. Explain why this is not necessarily an indication that the full tree should be used to classify future observations and the role of the best pruned tree.
C: Consider the minimum error tree in the CT_MinErrorTree worksheet. List and interpret the set of rules that characterize churners.
D: For the default cutoff value of 0.5, what are the overall error rate, Class 1 error rate, and Class 0 error rate of the best pruned tree on the test set? If required, round your answers to two decimal places.
Overall error rate ____ %
Class 1 error rate ____ %
Class 0 error rate ____ %
E: Examine the decile-wise lift chart for the best-pruned tree on the test set. What is the first decile lift? Interpret this value. If required, round your answer to two decimal places.