PROBLEM STATEMENT
Airbnb is a trusted and growing community for people who would like to list, find and rent accommodations. This website is an online market place that helps the "Hosts" connect with the "Guests" and vice versa. AirBnb does not focus on scaling their inventory, but concentrates on increasing the number of hosts and guests by matching them with each other. In 2008, the company went a step ahead by pulling data from social networking websites like Facebook to improve their connections.
To enhance the user experience, the company has now come up with a unique solution where the aim is to predict the probable destination a user would want to travel to. The new users on Airbnb can book an accommodation in 34,000 or above cities across 190 or more countries across the world.
REQUIRED DATASET
Our dataset is a collection of 1047891 instances and 14 fields including the class attribute. All these attributes are of type Nominal.
The following table contains the description of these attributes:
Attribute Name
|
Descriptions
|
Gender
|
Male, Female, Others
|
Age
|
<25, 25-40, 41-55, >56
|
Signup method
|
Basic, Facebook
|
Signup flow
|
The page the user came to sign-up up from
|
Language
|
English, Others
|
Affliliate_channel
|
What kind of paid marketing (Direct, Others)
|
Affliliate_provider
|
Where the marketing is (Direct, search and Other)
|
First_affiliate_tracked
|
What is the first marketing the user interacted with before signing up
|
Signup_app
|
Android, iOS, Moweb, Web
|
First_device_type
|
Desktop, Phone, Tablet and Unknown
|
First_Browser
|
Chrome, Safari etc.
|
Action_type
|
Web session log data (click, view, data, submit and others)
|
Device_type
|
Desktop, Phone, Tablet and Unknown
|
Country_destination
|
North America and Other
|
This should be my Final Output. You may want to consider the "similarity" between different cities based on geography, weather, environment, etc.
PLAN TO COLLECT THE DATASET
The dataset is available inKaggle's website under the Competitions sub-section. We can download the data consisting of the training set and test set which are split by dates. The test set consists of predictions about all the new users with first activities after 7/1/2014.We have thesessions dataset, which only dates to 1/1/2014. We also have, the user's dataset consisting of dates way back in 2010 too. We have to perform the necessary cleaning steps to attain a clean dataset.
POTENTIAL IMPLICATIONS OF THE EXPECTED RESULTS
By accurately predicting where a new user will book their first travel experience, Airbnb can share more personalized content with their community. They want to decrease the average time of first booking, and better forecast demand.
Every user will have a profile that can include recommendations and reviews by other users. It also has a private text and rating system. This is done based on the user's previous bookings and data history.