Question: Consider a classification problem with K classes for which the feature vector φ has M components each of which can take L discrete states. Let the values of the components be represented by a 1-of-L binary coding scheme. Further suppose that, conditioned on the class Ck, the M components of φ are independent, so that the class-conditional density factorizes with respect to the feature vector components. Show that the quantities ak given by (4.63), which appear in the argument to the softmax function describing the posterior class probabilities, are linear functions of the components of φ. Note that this represents an example of the naive Bayes model.