(a) The K nearest neighbour (KNN) algorithm uses a distance metric to order the training data in relation to a given test example. Given a problem with data in the form (x1,.....,xn,Y), where are independent variables, and y is the dependent variable for prediction, describe and explain an approach to weighting the k nearest neighbours so that nearer neighbours are more important when producing the final predicted y value for a test example.
(b) Outline the k-means clustering algorithm for a set of data defined as vectors xi. Include a diagram to support your algorithm description.
(c) Explain why the k-means clustering algorithm does not guarantee finding the optimal cluster locations for any given application of the algorithm. Given this non-optimal clustering, what does this imply in terms of how k-means should be used in practice to ensure a good clustering?