Part A:
Introduction: K-Nearest Neighbor (KNN) is a supervised learning algorithm where the result of new instance query is classified based on majority of K-nearest neighbor category. The purpose of this algorithm is to classify a new object based on attributes and training samples. Indeed, KNN used neighborhood classification as the prediction value of the new query instance.
The following data classifying the Power Saving Lights by their economical feasibility as Preserver or Wasteful
We consider 2 factors for classifying:
X1: Lightning Duration
X2: Power Consuming
We suppose use the number of nearest neighbor's k = 2.
The following data presents six training samples, using the KNN algorithm, classify the last sample as Preserver or Wasteful assuming that X1 = 10 and X2 = 500
X1: Lightning Duration (Hours)
|
X2: Power Consuming (Watts)
|
Y: Classification
|
6
|
900
|
Wasteful
|
2
|
150
|
Wasteful
|
5
|
600
|
Wasteful
|
3
|
80
|
Preserver
|
4
|
200
|
Wasteful
|
2
|
60
|
Preserver
|
10
|
500
|
???????
|
Table 1: Training data
1) Calculate the Euclidian distance between the query-instance and all the training samples.
2) Sort the distance and determine nearest neighbors based on the k-th minimum distance.
3) Gather the category Y of the nearest neighbors.
4) Use simple majority of the category of nearest neighbors as the prediction value of the query instance.
Part B:
Let us consider the training data below dealing with "Eye disease problem" to learn Naive Bayes Classifier.
Record ID
|
Age
|
Spectacle prescription
|
Astigmatic
|
Tear production Rate
|
Class label Lenses
|
1
|
Young
|
Myope
|
No
|
Reduced
|
Noncontact
|
2
|
Young
|
Myope
|
No
|
Normal
|
Soft contact
|
3
|
Young
|
Myope
|
Yes
|
Reduced
|
Noncontact
|
4
|
Young
|
Myope
|
Yes
|
Normal
|
Hard contact
|
5
|
Young
|
Hypermetrope
|
No
|
Reduced
|
Noncontact
|
6
|
Young
|
Hypermetrope
|
No
|
Normal
|
Soft contact
|
7
|
Young
|
Hypermetrope
|
Yes
|
Reduced
|
Noncontact
|
8
|
Young
|
Hypermetrope
|
Yes
|
Normal
|
Hard contact
|
9
|
Pre-presbyopic
|
Myope
|
No
|
Reduced
|
Noncontact
|
10
|
Pre-presbyopic
|
Myope
|
no
|
Normal
|
Soft contact
|
The goal is to classify (as "Noncontact", as "Soft Contact or as "Hard contact") a new record: R11: (Pre-presbyopic, Hypermetrope, Yes, Reduced)
For this purpose you have to calculate P(NonContact), P(Hard Contact), and P(Soft Contact)
1. Compute the conditional probabilities and class priors for each class label in the training set.
2. Compute the probability to assign each class label for the new record.
Class Label = Soft Contact:
Class Label = Hard Contact:
Class Label = NonContact:
3. Which class is to assign to the new record? Justify your answer.