Question on data mining
Your task is to predict the output variable "choice" based on 16 input features: x1, x2, ....,x15, x16.The output "choice" is a categorical variable that can take 5 possible values: "M", "B", "J", P", and "O".The first 8 input features (x1, x2, ....,x8) are binary variables. The last 8 input features (x9, x10, ....,x16) are continuous variables.
1. Train a decision tree inductive learning model on the data from the CSV file "finalQ3Train.csv" that contains 1500 examples.
2. Express your trained model in the form of IF ... THEN rules. Test your trained model on the 500 examples from the CSV file "finalQ3Test.csv" and present your confusion matrix.
3. Predict values for "choice" for the 8 examples in the csv file "finalQ3newCases.csv". The examples are shown below
x1
|
x2
|
x3
|
x4
|
x5
|
x6
|
x7
|
x8
|
x9
|
x10
|
x11
|
x12
|
x13
|
x14
|
x15
|
x16
|
1
|
1
|
1
|
1
|
1
|
0
|
1
|
0
|
0.0284
|
0.2196
|
0.5259
|
0.6206
|
0.0950
|
0.3350
|
0.2470
|
0.9676
|
1
|
1
|
0
|
1
|
1
|
0
|
0
|
1
|
0.7419
|
0.9260
|
0.4711
|
0.8340
|
0.8770
|
0.1129
|
0.4805
|
0.7469
|
0
|
0
|
1
|
0
|
1
|
0
|
1
|
1
|
0.3867
|
0.9002
|
0.4240
|
0.6029
|
0.5547
|
0.6674
|
0.1499
|
0.4527
|
0
|
1
|
0
|
1
|
1
|
0
|
0
|
0
|
0.8848
|
0.0752
|
0.1195
|
0.3625
|
0.1565
|
0.1205
|
0.7666
|
0.4188
|
1
|
0
|
0
|
0
|
1
|
1
|
1
|
0
|
0.2893
|
0.0067
|
0.1855
|
0.6999
|
0.5777
|
0.5959
|
0.0324
|
0.8211
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
0.7549
|
0.3705
|
0.3349
|
0.8772
|
0.9453
|
0.2476
|
0.3782
|
0.1878
|
1
|
1
|
1
|
1
|
0
|
1
|
1
|
1
|
0.7921
|
0.1539
|
0.9011
|
0.5596
|
0.7125
|
0.1035
|
0.0587
|
0.2399
|
0
|
0
|
1
|
0
|
1
|
0
|
0
|
0
|
0.7190
|
0.8441
|
0.5841
|
0.8670
|
0.7620
|
0.8794
|
0.3351
|
0.4677
|