Decision Tree
One major issue for any decision tree algorithm is how to choose an attribute based on which the data set can be categorized and a well-balanced tree can be created. The most traditional approach is called the ID3 algorithm proposed by Quinlan in 1986. The detailed ID3 algorithm is shown in the slides. The textbook provides some discussions on the algorithm in Section 18.3. For this problem please follow the ID3 algorithm and manually calculate the values based on a data set similar to (but not the same as) the one in the slides (p. 147). This exercise should help you get deep insights
on the execution of the ID3 algorithm. Please note that concepts discussed here (for example, entropy, information gain) are very important in information theory and signal processing fields. The new data set is shown as follows. In this example row 10
is removed from the original set and all other rows remain the same.
Following the conventions used in the slides, please show a manual process and calculate the following values: Entropy(S), Entropy(S weather = sunny ) ,
Entropy(S weather = windy ) , Entropy(S weather = rainy ) , Gain (S, weather), Gain (S, parents) and
Gain (S, money). Based on the last three values, which attribute should be chosen to split on?
Please show detailed process how you obtain the solutions.
Weekend
|
Weather
|
Parents
|
Money
|
Decision
(Category)
|
W1
|
Sunny
|
Yes
|
Rich
|
Cinema
|
W2
|
Sunny
|
No
|
Rich
|
Tennis
|
W3
|
Windy
|
Yes
|
Rich
|
Cinema
|
W4
|
Rainy
|
Yes
|
Poor
|
Cinema
|
W5
|
Rainy
|
No
|
Rich
|
Stay in
|
W6
|
Rainy
|
Yes
|
Poor
|
Cinema
|
W7
|
Windy
|
No
|
Poor
|
Cinema
|
W8
|
Windy
|
No
|
Rich
|
Shopping
|
W9
|
Windy
|
Yes
|
Rich
|
Cinema
|