Suppose that the p predictors X arise from sampling relatively smooth analog curves at p uniformly spaced abscissa values. Denote by Cov(X|Y ) = Σ the conditional covariance matrix of the predictors, and assume this does not change much with Y . Discuss the nature of Mahalanobis choice A = Σ -1 for the metric in (6.14). How does this compare with A = I? How might you construct a kernel A that
(a) downweighs high-frequency components in the distance metric;
(b) ignores them completely?