Project Tasks
Task 1: Analytic Objective
Individual groups are expected to come up with an analytic objective for which they are to utilize the knowledge and application of pattern discovery and predictive modelling using the SAS enterprise mining software. A well drafted business case will help you understand your data set; identify variable roles and measurement levels and ultimately your choice or method for doing your analytics.
An example of your analytic objective could take this form:
"A radio station wants to analyze the use of Web services such as simulcasts, podcasts, news streams, music streams, archives, and live Web music to see whether any unusual patterns exist in the combinations of services selected by its Web users. In this case study, you perform an association analysis"
Note: Individual groups are encouraged to come up with different Analytic objectives. No two (2) groups should have the same. Each group should attempt pattern discovery and predictive modelling using the assigned data set for this exercise.
Task 2: Data Analysis and Definition
Prepare in tabulated form the data dictionary which defines the variables as they appear in your data set as well as the model roles and Measurement levels. An example can be seen below.
Name
|
Model
Role
|
Measurement
Level
|
Description
|
STOREID
|
ID
|
Nominal
|
Identification number of the store
|
Tip 1: Execute the following steps in SAS Enterprise Miner
(i). Create a project with your group and group number as its name.
(ii). Create a library.
(iii). Create a data source by defining the data set (the one assigned to you) as a data source.
(iv). Determine whether the variable roles and measurement levels assigned to the variables are appropriate. The variable roles and measurement levels should match with the values in the data definition table above. Examine the distribution of the variables
2.1 Answer the following Questions.
1. Are there any unusual data values in any of your assigned input variables? Support your answer with appropriate argument.
2. List two possible strategies to handle cases with unusual values before attaching your desired analysis node? Explain the possible scenarios in which those strategies are appropriate.
3. Are there missing values in any of the input variables?
4. If you assigned a variable a rejected role, why is this case?
Task 3: Cluster and Association analysis
For groups requiring running Cluster or Association Analysis the following tips should help you and the questions should be responded to.
Tip 2: Execute the following steps in SAS Enterprise Miner
(v). Add your data source to the diagram workspace.
(vi). Add a Cluster node to the diagram workspace and connect it to the data source node.
(vii). Select the Cluster node and select Internal Standardization - Standardization.
(viii). Specify a maximum of six clusters and run the diagram from the Cluster node.
(ix). Add a Segment Profile node to the diagram workspace and connect it to the Cluster node.
(x). Run the diagram from the Segment Profile node.
3.1 Answer the following Questions.
5. What would happen if you did not standardize your inputs?
6. Using the results of the Segment Profile node, interpret the characteristics of the first three biggest clusters.
7. Why was cluster analysis chosen?
Tip 3: Execute the following steps in SAS Enterprise Miner
(i). Create a new diagram and Name the diagram (Name of your dataset).
(ii). Create a new data source using the data set.
(iii). Assign the variable roles to the variable.
(iv). Add the node for the data set and an Association node to the diagram.
(v). Change the setting for Export Rule by ID to Yes.
(vi). Leave the remaining default settings for the Association node and run the analysis.
3.2 Answer the following Questions.
1. What is the highest lift value for the resulting rules?
2. Which rule has this value?
3. Why was an Association Analysis run?
Task 4: Predictive Modeling
For groups requiring running their analysis with decision trees, regression and neural networks the following tips should help you and the questions should be responded to
Tip 4: Decision trees - Execute the following steps in SAS Enterprise Miner
(i). Create a new diagram named Predictive Analysis in your project
(ii). Define the data set as a data source for the project. Set the roles for the analysis variables as shown above.
(iii). Add the data set to the diagram workspace.
(iv). Add a Data Partition node to the diagram and connect it to the Data Source node. Assign 50% of the data for training and 50% for validation.
(v). Add a Decision Tree node to the workspace and connect it to the Data Partition node.
(vi). Create a decision tree model autonomously using average squared error as the model assessment statistic.
(vii). Add a second Decision Tree node to the diagram and connect it to the Data Partition node.
(viii). In the Properties panel of the new Decision Tree node, change the maximum number of branches from a node to 3 to allow for three-way splits.
(ix). Create a second decision tree model autonomously using average squared error as the model assessment statistic.
4.1 Answer the following Questions.
1. Why was the Target Variable assigned that variable role?
3. How many leaves are there in the optimal tree created in step (vi)? Which variable was used for the first split and explain why this variable was chosen over others?
4. How many leaves are there in the optimal tree created in step (ix)?
5. Which of the decision tree models appears to be better
a. based on average squared error on training data?
b. based on average squared error on validation data?
Tip 5: Regression - Execute the following steps in SAS Enterprise Miner
(x). Attach the StatExplore tool to the data source and run it. View the results of the StatExplore tool and determine if any of the variables have missing values.
(xi). Add an Impute node to the diagram and connect it to the Data Partition node. Set the node to impute U for unknown class variable values and the overall mean for unknown interval variable values. Create imputation indicators for all imputed inputs.
(xii). Add a Regression node to the diagram and connect it to the Impute node. Choose the stepwise selection and average squared error as the selection criterion. Run the Regression node and view the results.
(xiii). Disconnect the Impute node from the Data Partition node. Add a Transform Variables node to the diagram and connect it to the Data Partition node. Connect the Transform Variables node to the Impute node.
(xiv). Apply a log transformation to the DemAffl and PromTime inputs and Run the Transform Variables node.
(xv). Rerun the Regression node.
4.2 Answer the following Questions.
6. In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not?
7. Which variables are included in the final regression model generated in step (xii)? List the variables in the descending order of importance to the model.
8. Which variables are included in the final regression model generated in the last step?
9. Based on average squared error on the validation data, which of the two regression models generated appear to be better?
Tip 6 : Neural Networks - Execute the following steps in SAS Enterprise Miner
(xvi). Add a Neural Network tool to the diagram. Connect the Impute node to the Neural Network node.
(xvii). Set the model selection criterion to average squared error. Run the Neural Network node.
4.3 Answer the following Questions.
10. How many weights does the neural network model generated in step (xvii) include?
11. Examine the validation average squared error of the neural network model. How does it compare to the two decision tree models and the regression model generated after applying log transformation?
Task 5: Compare your models
Execute the following steps in SAS Enterprise Miner
(xviii). Add a Model Comparison node to the diagram. Connect it to all the predictive models generated in the earlier steps.
(xix). Run the Model Comparison node.
4.4 Answer the following Questions.
12. Examine the results of the Model Comparison node. Of the predictive models compared which model has been selected by the Model Comparison node? Based on what selection criteria this model has been selected?
13. Change the default values of the Model Comparison node properties so that it selects the model having the least average squared error on the validation data. Run the Model Comparison node again. Which model has been selected now?
14. Why are the models compared
Task 6: Business Implication
1. From the outcome of your analysis of the data set and the business case you have come up with, what can you deduce, recommend and conclude.
2. What is the business implications that can be drawn from the process of building and comparing these models, and has this practice helped resolve the business issue? Why or why not?