Problem
Machine learning has now permeated multiple disciplines, even politics. The current landscape in the US is rife with data scientists and other quantitative experts making predictions about ongoing and upcoming elections. Consider the Congressional Voting Records dataset from the UCI machine learning repository.
The dataset contains two files: one with a ".names" suffix and one with a ".data" suffix. The actual data is in the ".data" suffix and ".names" describes the metadata (i.e., describes what the different columns mean). Note that each row of the ".data" file contains one instance and includes both features and the class label (please take care to note the order). The machine learning problem here is to take the votes as an input and predict whether they are a Republican or a Democrat. In particular, our goal is to solve this problem using both decision trees and a Naive Bayes classifier.
There are three possibilities here: i) discard instances that have missing feature values, ii) treat "missing" as if it is a value (and thus a binary feature becomes a ternary, or three-valued, feature), iii) impute missing values (i.e., for each feature, replace missing values with the most common value for that feature), so that they are no longer missing or unknown. If you read the ".notes" file, it explains why some values are missing and what they mean.
A. Implement a decision tree and Naive Bayes classifier for classification, with each of the above three ways of dealing with missing values. So you are experimenting with 6 scenarios.
B. Perform 5-fold cross validation and report precision, recall, and F1-scores for each of the 6 scenarios.