A major retail chain that specialises in electronics goods has hired you as their Data Scientist. Over the past 12 months they have embarked on the rollout of a Loyalty/Affinity card for their customers.
The retail chain would like to complete the roll out of the Loyalty/Affinity card to the remainder of their stores and their customers. You can been assigned to the project and your roll is to build a predictive model that can be used to determine, from their database of customers, who are most likely to join their Loyalty/Affinity card.
You have been provided with a file containing the customers who have been involved in the Loyalty/Affinity card project so far. The data set contains both the customers who have joined the Loyalty/Affinity card and those that can not joined.
The target variable in the data set that indicates if the customer has taken up a Loyalty/Affinity card is the AFFINITY.
You will need to load the customer file, which is in CSV format, into your SAS Enterprise Miner workspace. From there you will analyse the data, develop a number of predictive models, evaluate these models to determine which one gives the best results and then to make your recommendations.
Data Description
Electronics Data |
Name |
Description |
CUST_ID |
Unique identifier of each customer |
CUST_GENDER |
The gender of the customer M or F |
AGE |
The current Age of the customer. You can assume this is correct and up to date |
CUST_MARITAL_STATUS |
Marital Status of the customer |
COUNTRY_NAME |
The country where the customer lives |
CUST_INCOME_LEVEL |
The salary range for the customer |
EDUCATION |
The highest level of education the customer has completed |
OCCUPATION |
The current occupation category for the customer |
HOUSEHOLD_SIZE |
The number of people in the household of the customer. This number includes the customer |
YRS_RESIDENCE |
How long the customer has lived at their current residence |
AFFINITY_CARD |
Target Variable. 0 = no affinity card, 1 = has taken an affinity card |
BULK_PACK_DISKETTES |
Indicator for this item purchased. 0 = no purchase, 1 = purchased |
FLAT_PANEL_MONITOR |
Indicator for this item purchased. 0 = no purchase, 1 = purchased |
HOME_THEATER_PACKAGE |
Indicator for this item purchased. 0 = no purchase, 1 = purchased |
BOOKKEEPING_APPLICATION |
Indicator for this item purchased. 0 = no purchase, 1 = purchased |
PRINTER_SUPPLIES |
Indicator for this item purchased. 0 = no purchase, 1 = purchased |
OS_DOC_SET_KANJI |
Indicator for this item purchased. 0 = no purchase, 1 = purchased |
Y_BOX_GAMES |
Indicator for this item purchased. 0 = no purchase, 1 = purchased |
See the separate instructions on the Notes webpage for how you can load external data into your account on the SAS OnDemand server.
Required Tasks
You are required to produce a report (following the CRISP-DM report, as much as possible) detailing your work investigating the data and classifying the provided data.
The first task you should complete is a data investigation exercise, where you will document the characteristics and other information that you can determine about each Feature.
You will need to work through/develop a number of classification models. To do this you need to use the data mining tool used in class. In this tool you can have a number of different classification techniques and within each of these you can modify the various parameter settings.
You will need to develop a number of classification models. When you have developed all of your models (using the appropriate classification techniques available in the tool), you will have to evaluate them and identify the classification model and configuration that gives the best or most appropriate answer.
Attachment:- Assignment_Data.csv