Assignment 1
Establish the beginnings of an Enterprise Data Analytics and Machine Learning Strategy with examples. Provide the background for an organization, including the type of business, major data types, and the business processes that use these data. Describe the maturity of the data, the role of analysis versus analytics, and how the R language can be used to improve business.
During Part 1, you will establish the foundation and shell document for your final assignment, which will be your Enterprise Data Analytics and Machine Learning Strategy. Each subsequent Part, you will revise and complete an additional section.
First, you will select an organization (real or hypothetical) and apply your research to the development of an Enterprise Data Analytics and Machine Learning Strategy that would be appropriate for statistical data mining with the organization.
The project deliverables include the following:
• Organizational Background
o Provide a brief description of the organization (real or hypothetical), type of business, major data types, and the business processes that use these data.
• Data Maturity Within the Organization
o Describe the maturity of data within the organization, including data quality, master data management, use of data warehouses, and the importance of data in making business decisions.
o Describe the analyst role and his or her use of data. Elaborate on how analytics may augment or replace the common analyst role.
o Discuss how the R platform may be useful to the organization.
The draft paper should be 10-12 pages, including empty sections. It should be formatted using APA style and include at least two references. The addition of the new material shall be 3-4pages of original content.
Assignment 2
For Part 2, you will extend your Enterprise Data Analytics and Machine Learning Strategy to include the appropriate use of regression and classification methods within your organization.
During this Part, you will utilize the Iris dataset provided with R or locate or create an example dataset that meets the assumptions of either a regression or classification model and illustrate the application with R or RStudio. The example is intended to illustrate how these techniques may be applied within the organization. Additional discussion should occur around similar approaches with organizational-specific data.
The project deliverables include the following:
• Describe regression and classification techniques, their uses, and when and why they are used.
• Utilize example data, such as the Iris data provided with R, or locate or create an example dataset that meets the assumptions for regression or classification models. Describe the data, andprovide code examples for utilization of either a regression or classification approach in R or RStudio. Include screenshots where appropriate, and discuss the steps utilized.
• Discuss how these results would be communicated to a technical and nontechnical audience.
• Discuss how these techniques would be used within the organization. Use examples to reinforce your ideas.
Using the partially completed template you created last Part, add 3-4 pages of new content. It should be formatted using APA style and include at least two references.
Assignment 3
Extend the Enterprise Data Analytics and Machine Learning Strategy plan to include model performance evaluation techniques. Building upon the regression or classification technique discussed via an analysis with code examples, provide additional code that evaluates the performance of the model.
For Part 3, you will extend your Enterprise Data Analytics and Machine Learning Strategy to include the appropriate use of performance evaluation.
During this Part, you will continue to utilize either the Iris dataset or an example dataset of your choosing to evaluate the performance of prior modeling. Screenshots of R or RStudio should be provided to support the research and analysis. Additional discussion should occur around similar approaches with organizational-specific data.
The project deliverables must include the following:
• Describe performance evaluation for regression and classification.
• Expanding upon the modeling example in Part 2, discuss specific performance evaluation considerations for the modeling technique.
• Provide code examples and a discussion for how the model will be evaluated.
• Discuss the overall fit for use of the algorithm to make a data-driven decision as well as any risks for use.
• Discuss how these techniques would be used within the organization, specific to the available data and desired outcomes. Use examples to reinforce your ideas.
Using the partially completed template created in Part 1 and extended in Part 2, add 3-4 pages of new content. It should be formatted using APA style and include at least two references.
Assignment 4
Building upon the modeling, optimization, and validation, you will now explore how visualization can assist with these activities as well as communicate the findings of the analytics project.
For Part 4, you will extend your Enterprise Data Analytics and Machine Learning Strategy to include visualization techniques.
During this Part, you will continue to utilize either the Iris dataset or an example dataset of your choosing to apply analytics visualization techniques. Screenshots of R or RStudio should be provided to support the research and analysis. Additional discussion should occur around similar approaches with organizational-specific data.
The project deliverables include the following:
• Describe the benefits of visualization in an analytics project from two perspectives: interpreting models and communicating results.
• Out of the following potential visualization techniques, or others if you choose to research additional techniques, choose 3, and compare and contrast their benefit and when they should be used. In this discussion, include the role of static and interactive visualizations.
o Histogram, box plot, bar or line chart, scatter plot, heat map, mosaic map, geolocation map, three-dimensional (3-D) graphs, correlogram, bubble chart, or arc graph
• Provide code examples and output examples for one of the chosen models using the Iris or example dataset.
• Discuss how these techniques would be used within the organization, specific to the available data and desired outcomes. Use examples to reinforce your ideas.
Add 3-4 pages of new content to the plan you developed over the length of this course. It should be formatted using APA style and include at least two references.
Assignment 5
Extend your Enterprise Data Analytics and Machine Learning Strategy plan to include the end-to-end predictive modeling process with the R language. Emphasis will be placed on good data management practices, automation, predictive modeling competency, and communicating the results.
The first four Parts consisted of identifying an organizational opportunity, utilizing regression and classification techniques, evaluating model performance, and visualizations.
For Part 5, you will extend your Enterprise Data Analytics and Machine Learning Strategy to include both flowcharts as well as the use of an industry process, such as the Cross Industry Standard Process for Data Mining (CRISP-DM).
During this Part, you will continue to utilize either the Iris dataset or an example dataset of your choosing to illustrate the end-to-end use of R that aligns with the identified flowchart and industry data mining process.
The project deliverables include the following:
• Identify an industry process upon which the organization may choose to standardize. Examples could be CRISP-DM, Sample, Explore, Modify, Model, and Assess (SEMMA), or Knowledge Discovery in Databases (KDD).
• Create a workflow that illustrates the flow of data through the identified process. This should include data source origination through organizational use of the findings of modeling.
• Provide R code examples or screenshots of the end-to-end process. Previous code from prior sections should be utilized; however, the end-to-end code with supporting screenshots, plots, and visualizations should be provided.
• Describe how the model would be deployed and used.
• Describe how the results of the identified modeling activity would be communicated to a technical and nontechnical audience.
Format your assignment according to the following formatting requirements:
1. The answer should be typed, double spaced, using Times New Roman font (size 12), with one-inch margins on all sides.
2. The response also includes a cover page containing the title of the assignment, the student's name, the course title, and the date. The cover page is not included in the required page length.
3. Also include a reference page. The Citations and references should follow APA format. The reference page is not included in the required page length.