Problem
Using the training data set, create a C5.0 model to predict a loan applicant's Approval using Debt-to-Income Ratio, FICO Score, and Request Amount. Obtain the predicted responses.
Evaluate Model 1 using the test data set. Construct a contingency table to compare the actual and predicted values of Approval.
For Model 1, calculate all of the model evaluation measures from the above table (name it as Model Evaluation Table). Leave space for Model 2.
Interpret each of the Model 1 evaluation measures from the Model Evaluation Table.
Construct the simplified data-driven cost matrix based on the following.
• Compute the mean of the Interest per loan applicant from the training data set. Set the negative of that value to be the cost of a true positive.
• Calculate the mean Request Amount per loan applicant from the training data set. Set this value to be the cost of a false positive.
• Obtain the simplified data-driven cost matrix.
Using the training set, build a C5.0 model (Model 2) to predict a loan applicant's Approval using Debt-to-Income Ratio, FICO Score, and Request Amount, using the simplified data-driven cost matrix.
Populate the Model Evaluation Table with the evaluation measures for Model 2, using the data-driven cost matrix.
Interpret each of the Model 2 evaluation measures from the Model Evaluation Table.
Compare Model 1 and Model 2 using the Model Evaluation Table. Discuss the strengths and weaknesses of each model.
How much money did we make for our bank by using data-driven error costs to evaluate our models?