1. In this part, we are going to build a decision tree classifier in MATLAB to predict the acceptability of cars. The dataset can be found in the CSV file car.csv. The first six columns are the attributes of the cars and the last column is class label denoting the evaluated car acceptability. The attribute information is shown in the following table:
Attribute
|
Description
|
Possible values
|
buying
|
Buying Price
|
Vhigh,high,med,low
|
Maint
|
Maintenance Price
|
Vhigh,high,med,low
|
Doors
|
No.of doors
|
2,3,4,5 more
|
Persons
|
Passenger capacity
|
2,4 more
|
Lug boot
|
Size of luggage boot
|
Small big med
|
Safety
|
Estimated safety of the car
|
Low ,med, high
|
Accept (class label)
|
Acceptability of the car
|
Unacc,acc,good,vgood
|
a. All the six attributes (except the class label) require encoding (i.e. transforms them into integral values). Propose an encoding scheme, and implement it into a MATLAB function to perform the encoding. The answer should include the followings:
i. Encoding scheme for each attribute (preferably presented as an encoding table)
ii. MATLAB function source codes implementing the encoding scheme in (i)
b. By using the first 75% of the tuples for training, and the remaining 25% for testing, build a MATLAB decision tree classifier to predict the acceptability of the cars in the testing dataset.
The answer should include the followings:
i. MATLAB source codes building classifier, and car acceptability prediction using the testing dataset. Clear instructions must be given for the execution of your source codes, or marks will be deducted otherwise
ii. The decision tree built, accuracy, sensitivity, and specificity (i)) Submit your MATLAB source codes in separate MATLAB files. c. Why is the accuracy in
(b.ii) much lower than sensitivity and specificity?
2. Name two techniques in data pre-processing. Discuss their basic principles and illustrate one example for each technique discussed.
3. The following table presents the survey results about the preference of the mobile phone game apps.
Age
|
Gender
|
Occupation
|
Preferred type of game apps
|
29
|
F
|
Manager
|
Puzzle
|
25
|
M
|
Manager
|
Action
|
27
|
M
|
Student
|
Sports
|
17
|
M
|
Student
|
Action
|
23
|
M
|
Clerk
|
Puzzle
|
30
|
F
|
Clerk
|
Sports
|
14
|
M
|
Student
|
Puzzle
|
28
|
F
|
Clerk
|
Sports
|
22
|
M
|
Clerk
|
Action
|
36
|
M
|
manager
|
puzzle
|
Using the above table as the training dataset, build a decision tree by applying Hunt's algorithm. The class label is the preferred type of game apps. Age, gender, and occupation are the attributes. Split the attributes using multi-way split with GINI index. Show clearly your steps and calculations.