Supervised Learning - OneR models
Part 1: Because we all know that more data is better, I have merged your survey data with that of the previous DM_2018 cohort.
Using the HW5_survey1820.xls dataset:
- Draw a 1R tree for each student Q-attribute (i.e. Q2 - Q8) to predict their rating for the 'ethical' descriptor to the Wired article's subject. What is each tree's error-rating?
- Draw a 1R tree to predict each student Q-attribute from answers for the descriptor 'deceitful'. What is each tree's error-rating?
- In each approach a and b above, which tree(s) seem to give the "best" (i.e. most trustworthy) results? What is the second-best tree for each? Why might you tend to prefer the 2nd-best tree's results to the "best" tree's results?
- Work up and describe the general profile of the person who rates the 'deceitful' descriptor as a 1. As a 2. As a 3. How confident are you of these resulting profiles' accuracy, as applied to CS majors generally?
? Which do you feel to be the most accurate, best-predictive trees?
Part 2: Using the Iris data in HW3_train.csv, discretize each of the numeric ranges for attributes A through D. Draw the resulting 1-R trees and their accuracy ratings according to your training data.
- Use 6 as your minimum-majority value -- that is the minimum size of your majority class in each discrete subset of your numeric data
- Determine the accuracy of each according to the testing set HW3_test.csv.
- Incorporate these results into a table including your accuracy-results from HW3 (K-Means and Fuzzy Classification models). Discuss these results and how they compare.
Part 3: Download the WEKA application to your computer. Use its Explorer module to select the full Iris dataset (provided by WEKA as an ARFF file). Use the entire dataset as a training-set; do not worry about using a test-file. Use WEKA's OneR classifier to determine the best single-attribute predictor for irises.
Add this result to your model accuracy-comparison table from #2 above, and discuss its placement among the previous three.
Survey Questions:
For your reference, these were the survey questions in the Qualtrics survey you took earlier this semester:
Question 1 Rate the following attributes on a scale of 1 (least applicable) to 3 (most applicable)
:
:
Question 2: Are you currently (or have you been) in a long-term relationship?
Question 3: What is your gender?
Question 4: Are you a CS major?
Question 5: Are you 22 years old (or older)?
Question 6: Is/Was your hometown community of population 20,000 or less?
Question 7: Is your most recent cumulative GPA 3.0 or above?
Question 8: Will you have graduated by Summer '20?
Attachment:- Supervised Learning.rar