Question: Suppose we know exactly two arbitrary distributions p(x|ωi) and priors P(ωi) in a d-dimensional feature space.
(a) Prove that the true error cannot decrease if we first project the distributions to a lower dimensional space and then classify them.
(b) Despite this fact, suggest why in an actual pattern recognition application we might not want to include an arbitrarily high number of feature dimensions.