General assignment: Predictive and Prescriptive data analytics. You should develop and validate predictive models (regression, classification, clustering - using one or more of the methods covered in class to date or one of your choosing) for two of the five datasets below and apply them for decision purposes. Please use the section numbering below for your written submission for this assignment. References - websites, papers, packages, data refs, etc,
https://archive.ics.uci.edu/ml/datasets/Bank+Marketing,
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29,
https://archive.ics.uci.edu/ml/datasets/Wine+Quality,
https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime.
1. Exploratory Data Analysis
Explore the statistical aspects of both datasets. Analyze the distributions and provide summaries of the relevant statistics. Perform any cleaning, transformations, interpolations, smoothing, outlier detection/ removal, etc. required on the data. Include figures and descriptions of this exploration and a short description of what you concluded (e.g. nature of distribution, indication of suitable model approaches you would try, etc.). Min. 3/4 page text + graphics (required).
2. Model Development, Validation, Optimization and Tuning
Choose one or more models. Explain why you chose them. Construct the models, test and validate them. Explain the validation approach. You can use any method(s) covered in the course. Compare model results if applicable. Report the results of the model fits (coefficients, graphs, trees, etc.), predictors, and statistics. Min. 3 page text + graphics (required).
3. Decisions
Describe your conclusions in regard to the model fit, prediction and how well (or not) it could be used for decisions and why. Min. 3/4 page text + graphics.