Universal Bank is a young bank growing rapidly in overall customer acquisition. The majority of these customers are depositors with varying sizes of relationship with the bank. The customer base of borrowers is quite small, and the bank is interested in expanding this base to bring in more loan business. In particular, it wants to explore ways of converting depositors to borrowers while retaining them as depositors.
A campaign that the bank ran last year for depositors showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise smarter campaigns with better targeted marketing. The goal of this assignment is to model the previous campaign's customer behavior to analyze what combination of factors make a customer more likely to accept a personal loan. This will serve as the basis for the design of a new campaign.
The file UniversalBank contains data on 5000 customers. The data include customer demographic information, account information, and the customer response to the last personal loan campaign. The layout of the file is described below. The last five columns are Yes/No responses. 0 = No; 1 = Yes.
Data Description:
ID
|
Customer ID
|
Age
|
Customer's age in completed years
|
Experience
|
#years of professional experience
|
Income
|
Annual income of the customer ($000)
|
Family
|
Family size of the ustomer
|
CCAvg
|
Avg. spending on credit cards per month ($000)
|
Mortgage
|
Value of house mortgage if any. ($000)
|
Securities Account
|
Does the customer have a securities account with the bank?
|
CD Account
|
Does the customer have a certificate of deposit (CD) account with the bank?
|
Online
|
Does the customer use internet banking facilities?
|
CreditCard
|
Does the customer use a credit card issued by UniversalBank?
|
Personal Loan
|
Did this customer accept the personal loan offered in the last campaign?
|
In this assignment, you will use a set of R scripts that I wrote to train and test a K nearest neighbors (KNN) classifier for the UniversalBank data set.
The script UB_tr_vl_ts.R partitions UniversalBank into a training set (50% of the cases), a validation set (30% of the cases) and a test set (20% of the cases). This process corresponds to slides 6 and 7 in this week's slide deck.
With the script BillsKNNtrain.R, you supply k, the number of neighbors to use in the analysis, and R calculates the training error and the validation set results including the confusion matrix, the error rate, the true positive rate and the true negative rate.
With the script BillsKNNtest.R, you supply k, the number of neighbors to use in the analysis, and R calculates the test set results including the confusion matrix, the error rate, the true positive rate and the true negative rate.
To complete this assignment, answer the questions below in a Word document and submit the document by the due date.
1) Produce a table similar to the one shown in this week's slide 15. Investigate k values from 1 through 20 and report the training error and the validation error.
2) From your results in question 1, choose the best value of k for this analysis and explain your choice..
3) Run BillsKNNtest.R for your chosen value of k.
4) From your results in questions 1, 2 and 3, what error rate can you expect on new data if you use your chosen value of k? Explain how you arrived at your answer.
5) For your chosen value of k, explain why the Validation Confusion Percentages and the Test Confusion Percentages are different.
6) Explain how we avoid overfitting in the development of this knn classifier.
7) Explain why the training error for a 1 nearest neighbor classifier is always 0.
8) What do the True Positive Rate and the True Negative Rate tell us about the performance of the classifier? Why might this information be useful to someone using the classifier?
9) Evaluate the following statement. Since every student uses the same UniversalBank.txt file, every student's confusion matrices should be exactly the same.
https://wikisend.com/download/761982/BillsKNNtrain.R
[URL=https://wikisend.com/download/761982/BillsKNNtrain.R]BillsKNNtrain.R[/URL]
https://wikisend.com/download/616584/UB_tr_vl_ts.R
[URL=https://wikisend.com/download/616584/UB_tr_vl_ts.R]UB_tr_vl_ts.R[/URL]
https://wikisend.com/download/139866/UniversalBank.txt.docx
[URL=https://wikisend.com/download/139866/UniversalBank.txt.docx]UniversalBank.txt.docx[/URL]
https://wikisend.com/download/640962/BillsKNNtest.R
[URL=https://wikisend.com/download/640962/BillsKNNtest.R]BillsKNNtest.R[/URL]