Assignment -
Depending on your analysis, you select the appropriate data analytic method if for instance you select association rules you will need to investigate and analyze the relationships between rules. If clustering or classification is selected, the student need to define what are o the output and the inputs variables, etc.
To undertake this analysis and answer the above questions, you should follow the data analytics life cycle below to undertake an intensive exploration and visualizing of the data set to be able to select either association rules or regression and related R Programing Algorithm e.g. Apriori, Linear or logistic regression, k-means or decision tree:
1. Discovery: Identify essential information or Stating the appropriateness analytics problem solving
a. State the analytics problem, why it is important, and to whom
b. Determine the general analytic problem type (such as association rules or regression)
c. If you don't know, then conduct initial research to learn about the domain area you'll be ?analyzing,
d. Show comprehensive understanding of the issue/problem in hand
e. Evidence that data analytics is the most appropriate problem solving approach for this problem.
2. Data Preparation: Organize into usable Data/information a Review the raw data
a. Explore, preprocess and condition data in format most appropriate to the context,
b. Us R code to import/export data sets and to deal with missing or dirty data,
c. Use R descriptive stats to examine variables,
d. Plot data to explore and graphically visualize data before analysis
e. Determine needed transformations
f. Assess data quality and structuring Determine and establish data connections for raw data and clean and normalize that data, if required
3. Model Planning: Problem Analysis
a. Shows understanding of the problem in hand requirements and its related dataset,
b. Select the most appropriate analytical models / methods solution to apply to the given ?dataset to meet the business objective.
c. Use R code and statistical methods to evaluate and test hypothesis, to visualize data for ?analysis and also R code for running data analytic algorithms like k-means and Apriori, and R plotting and analyzing the results of running these algorithms.
d. Select methods based on hypotheses, data structure and volume
e. Ensure techniques and approach will meet business objectives
4. Model Building: Formulate reasoned conclusions
a. Need to ensure that the model data is sufficiently robust for the model and analytical ?techniques
b. Smaller, test sets for validating approach, training set for initial experiments
c. Deep and thoughtful partition of the dataset to form separate samples for training, testing ?and production.
d. Extensive training and testing of the developed analytical model to arrive to a solid ?analytical solution.
e. Confident in using R code and statistical methods to evaluate and test the developed analytical model, to train, test and deploy the data analytic model, f visualize results for different use of the analytical model and
5. Communicate/Interpreting the Results Formulate reasoned conclusions
a. Interpret the results
b. Summarizing findings, depending on audience
c. Compare with confidence the outcome of the modelling to the assumption and criteria set at the model planning phase.
d. Summarize the findings Confident in using R code and statistical methods to present and interpret the data analytical results,
e. Capable of using different R plotting techniques to visualize and adapt results to different users.
Suggest how to deploy and operationalize it
Confidently demonstrate how the developed analytical solution could be deployed to solve the problem set in the case study. Assess the benefits Provide final deliverables
Implement the model in the production environment?Define process to update, retrain, and retire the model, as needed.
Attachment:- Assignment Files.rar