Enterprise Business Intelligence SAS Enterprise Miner Assignment-
Task -
Business Case: A financial services company offers a home equity line of credit to its clients. The company has extended several thousand lines of credit in the past, and many of these accepted applicants (approximately 20%) have defaulted on their loans. By using geographic, demographic, and financial variables, the company wants to build a model to predict whether an applicant will default.
After analyzing the data, the company selected a subset of 12 predictor (or input) variables to model whether each applicant defaulted. The response (or target) variable BAD indicates whether an applicant defaulted on the home equity line of credit. These variables, along with their model role, measurement level, and description are shown in the following table.
The SAMPSIO.HMEQ data set contains 5,960 observations for building and comparing competing models. The data set is split into training, validation, and test data sets for analysis.
Tasks
1. Create a project, library, identify the data source
2. Identify your input data using the table above
3. Partition the data by adding the data partition node (Training - 40% Validation - 30% and Test - 30%)
4. Inspect the distribution of the values in the input data for each variable
5. Fit and evaluate regression model with data replacement by adding a regression model to the diagram
6. Is this model useful? (Hint: If the model is useful, the proportion of individuals that defaulted on a load is relatively high in the top deciles and the plotted curve is decreasing).
7. Explore the DEBTINC variable, are there missing values and what is the implication for a regression model?
8. Do you think imputation is required and if yes for what reason?
9. Use the Impute node to impute missing values in the input data set if it requires imputation to properly fit the regression model
10. Set the value of Type to Single to create a single indicator variable that indicates whether any variable was imputed for each observation. Set the value of Type to Unique to create an indicator variable for each original variable that identifies if that specific variable was imputed. For this example, set the value of Type to None
11. Add a new regression node and connect to impute node and run
12. Comment on the new statistics for the first decile
a. What is the cumulative % response
b. What is the % response
c. Cumulative lift
13. Finally relate responses to the business problem and comment on the business implication.
Attachment:- Data.rar