Dealing with missing data
Suppose that you use a Gaussian discriminant classifier, in which you model explicitly P(y = 1) (using a binomial) and P(x|y = 0) and P(x|y = 1). The latter have distinct means µ0 and µ1, and a shared covariance matrix Σ (a frequent assumption in practice). Suppose that you are asked to classify an example for which you know inputs x1, . . . xn-1, but the value of xn is missing. In practice, a common approach in this case is to "fill in" the value of xn by its class-conditional means, E(xn|y = 0)and E(xn|y = 1). Using the log-odds ratio, give a mathematical justification for this approach.