Classification Tree (6%) Plumbing Inc. has been selling plumbing supplies for the last 20 years. The owner, Joe, decides that it is time to diversify by adding gardening tools to the products.
Having had success using customer data to build predictive models to guide direct mail campaigns for special plumbing offers, he considers that data mining could help him to identify a subset of customers who should be good prospects for his new set of products.
Is Joe ready to solve this as a supervised learning problem?
Explain your reasoning. If yes - what would you suggest as the target variable?
If no - why not? Information Gain Consider the example on page 23 in the lecture note for Session 3 (the same figure is on page 49 in the textbook).
We calculated information gain from splitting the population based on Body Shape. The procedure is demonstrated in the lecture note and in the textbook. As an assignment, you will calculate the information gain from Body Color.
You already know the entropy of the population. Split the set on the values of Body Color and calculate the entropy of each subset. Using the entropy values of population and children (subsets), calculate the information gain from Body Color.
QWE case - Classification Tree Read the case document.
1) What is a target variable? In one or two sentences, please provide a definition for the target variable. Be as precise as possible.
2) Build a classification tree model using rpart for the problem you suggested in 1). Include all other variables in the dataset as explanatory variables in the model. Set cp=0.004. Plot a classification tree. Save and submit the image.
3) How will you interpret the model result? Describe your observation.