Perform the necessary tasks using rapidminer answer the


Comparing Methods Assignment

This assignment will be completed in teams of students.

Introduction

The purpose of this assignment is to demonstrate your knowledge and understanding of the analytical techniques and tools learned in the course and to show your understanding of how it relates to a business scenario. This assignment is somewhat different from previous ones: I do not give you very detailed instructions on how to build your analytical process in RapidMiner. Instead, you are expected to do the modeling, validation and performance analysis on the given dataset so you could answer the questions below and make some recommendations in the business situation as it applies.

Submission Instruction

Perform the necessary tasks using RapidMiner, answer the questions below and prepare the required screenshots.

- a Word document file with the answers and screenshots to the lettered questions. (Make sure that the lettering of questions stays the same!) Place the team member names on the top of the document. Name your file Comparing Methods Assignment LastName1-LastName2... .docx. (Warning: for full points, make sure that you name documentscorrectly and keep the answers correctly numbered lettered.)

- the RapidMiner project file, named Comparing MethodsAssignment LastName1-LastName2... .rmp. (The project file can be generated from RapidMiner by going to File -> Export Process. Select the destination folder and the name for the file. It will be saved as a .rmp file.)

Instructions

Download the mobile-churn.csvfile posted on Canvas. The file contains a dataset collected by a phone company about attrition, in other words, about customers who cancelled their services and possibly signed up with another company. The company is interested in what it could do to keep customers, to prevent their defection. Look at the data and make some recommendations based on the findings of your analysis.

Here is the explanation of the variables in the dataset:

a. Gender_Female: female or not
b. PhoneService_Yes: whether the customer has phone service with the company
c. MultipleLines_Yes: whether the customer has multiple line service
d. InternetService_DSL: whether the customer has DSL internet
e. InternetService_Fiber optic: whether the customer has Fiber optic internet
f. StreamingTV_Yes: customer streams TV
g. StreamingMovies_Yes: customer streams movies
h. Contract_One year: type of contract for customer: 1 yr
i. Contract_Two year: type of contract the customer: 2 yr
j. PaperlessBilling_Yes: whether the customer signed up for paperless billing
k. PaymentMethod_ Automatic: payment set up to be automatic
l. Retired: 0 for not, 1 for yes
m. Tenure (months): how long has been a customer with the company
n. MonthlyCharges: $ amount of monthly payments for the subscribed services
o. Churn: Whether the customer churned (i.e. is not a customer any more)

1. As a first step, build 3 models using different classification techniques (Neural Net: use the default settings; Decision Tree: use gini_index as the criterion; and Logistic Regression: use the default settings) that are capable of classifying customers into 2 categories (churn/no churn.)Use the X-validation operator right away for each techniques used. Set the number of folds to 3 (it will result in shorter process runtimes).For measuring the performance of the 3 models, look at the following performance measures:Accuracy, Kappa, Lift, F-measure, AUC (NOT the optimistic or pessimistic). (Hint: use the binomial classification performance operator to obtain all of these measures.)

Make 3 readable screenshots of the following for all 3 models (9 screenshots; 9pts):

- Top level processes
- Parametersettings for the 3 different techniques that are inside the cross validation operator
- Appropriate model results (Network, Tree, Weights)

2.

a. Make a screenshot of the confusion matrix output for each of the 3 methods.

b. Prepare a table to report the 5 performance measuresfor the 3 models. Put the different models in the rows and have 5 columns for the 5 measures.

 

Accuracy

Kappa

Lift

F

AUC

NN

 

 

 

 

 

DT

 

 

 

 

 

LR

 

 

 

 

 

c. Discuss the performance for each of the three models based on the performance measures. Relate the performances to the baseline model (calculate thea priori probabilities first!).

Prepare a visual evaluation of the 3 models by including a screenshot of the ROC comparison chart. (Hint: Use the Compare ROC operator. Have the same models with the same parameters as in the other runs above.)

d. Usingthe observed performance measures, compare the performance of the 3 models. Do they perform the same? Which one is better, worse, why?

e. Are the 3 models giving you more or less the same suggestions regarding the important factors/variables? If there are differences, what are they?

3. Choose one of the models (possibly the best performing one) and address the following questions:How can you interpret the results of the model? Which attributes seem to matter the most? How do you know it? Discuss their importance and/or effect sizes.

4. How could the results of the model be useful for the telecommunications company? What business recommendations can be suggested based on the results?

Attachment:- Mobile-Churn.rar

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Perform the necessary tasks using rapidminer answer the
Reference No:- TGS02550012

Expected delivery within 24 Hours