Task - Clustering (To be completed in R/RStudio):
The Dataset for task: East West Airlines (Please refer to the assigned dataset)
The East West airlines dataset contains information on 1000 passengers who belong to an airline's frequent flier program. For each passenger, the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for targeting different segments for different types of mileage offers.
Dataset Description:
Field Name
|
Data Type
|
Max Data Length |
Raw Data or Telcom Created Field?
|
Description
|
ID#
|
NUMBER
|
|
Telcom
|
Unique ID
|
Balance
|
NUMBER
|
8
|
Raw
|
Number of miles eligible for award travel
|
Qual miles
|
NUMBER
|
8
|
Raw
|
Number of miles counted as qualifying for Topflight status
|
ccl_railes
|
CHAR
|
1
|
Raw
|
Number of miles earned with freq. flyer credit card in the past 12 months:
|
cc2_miles
|
CHAR
|
1
|
Raw
|
Number of miles earned with Rewards credit card in the past 12 months:
|
cc3_miles
|
CHAR
|
1
|
Raw
|
Number of miles earned with Small Business credit card in the past 12 months:
|
note: miles bins:
|
|
|
|
1 = under 5,000
|
|
|
|
|
2 = 5,000 - 10,000
|
|
|
|
|
3 = 10,001 - 25,000
|
|
|
|
|
4 = 25,001 - 50,000
|
|
|
|
|
5 = over 50,000
|
Bonus_miles
|
NUMBER
|
|
Raw
|
Number of miles earned from non-flight bonus transactions in the past 12 months
|
Bonus_trans
|
NUMBER
|
|
Raw
|
Number of non-flight bonus transactions in the past 12 months
|
Flight_miles_12mo
|
NUMBER
|
|
Raw
|
Number of flight miles in the past 12 months
|
Flight_trans_12
|
NUMBER
|
|
Raw
|
Number of flight transactions in the past 12 months
|
Days_since_enroll
|
NUMBER
|
|
Telcom
|
Number of days since Enroll_date
|
Award?
|
NUMBER
|
|
Telcom
|
Dummy variable for Last_award (1=not null,
|
For the dataset, complete the following tasks:
• Missing Values:
o Are there any anomalies (unusual data/missing values) in the given dataset? Support your answer with appropriate argument.
o List possible strategies to handle cases with unusual or missing values in data (if applicable)?
• Clustering:
o Perform k-means clustering on the dataset.
o What would be the optimal value of 'k' in this case? Explain how you came to this conclusion.
o Which cluster(s) would you target for offers, and what type of offers would you make to customers in that cluster? Include proper reasoning in support of your choice of cluster(s) and the corresponding offer(s).
o Mention the business proposition for the first largest cluster. What potential offers can you make for this cluster, to increase ticket sales?
o If applicable, mention the business proposition for the second largest cluster.