- This assignment will analyze the data (HotelClickStream.xls) and interpret the results. This dataset includes clickstream data of online transactions for hotel booking in year 2011. Appendix includes the detailed description for the variables.
- Please follow the instructions very carefully to do this assignment! Please do the following analyses and answer the corresponding questions. Please copy/summarize your key results for each question to a word file along with your answers to produce the final report for submission.
1. Please first create the following 2 additional variables into your data
1) REF_D (create a dummy variable indicating whether the transaction was referenced from other website, if not, the final booking website was directly accessed. If no information provided for the variable REF_DOMAIN_NAME, REF_D = 0; otherwise REF_D = 1)
2) LOG_PRICE (take the log transformation of the variable PROD_TOTPRICE using the LOG function in excel)
a) Please provide a summary table showing the top 10 domain names (DOMAIN_NAME) that generated the most volume of transactions the report should look like the following Table (Hint: one way to do this is to use the COUNTIF function in excel). Please summarize briefly your observations from the results
Rank
|
Domain Names
|
# of Transactions
|
1
|
marriott
|
524
|
b) Please provide a summary table showing the top 10 reference domain names (REF_DOMAIN_NAME) that generated the most volume of transactions the report should look like the following Table. Please summarize briefly your observations from the results.
Rank
|
Reference Domain Names
|
# of Transactions
|
1
|
google
|
620
|
c) Please provide summary statistics (N, Max, Min, Mean, and Std.) for variables: DIRECTP_D; REF_D; DURATION; PAGES_VIEWED; LOG_PRICE; and TRANS_FREQ. Please report your summary statistics table and provide short descriptions (a few bullet points) of your observations.
2. Please use the Binary Outcome (Logistic/Logit) regression technique to answer the question on "what are the factors that influence people's decision on whether to book directly on a hotel website or from other third party website?" Please use DIRECT_D as your Dependent Variable (DV); and REF_D, LOG_PRICE, TRANS_FREQ, DURATION, HOUSEHOLD_SIZE, CHILDREN_D, and CONNECTIONSPEED_D as your Independent Variables (IV). Please report and interpret your regression results, which should include the interpretation of the regression coefficients.
3. a) Please use the Count Data (Poisson) regression model to answer the question on "what are the factors that influence people's booking frequencies?" Please use TRANS_FREQ as your DV; and REF_D, LOG_PRICE, PAGES_VIEWED, HOUSEHOLD_SIZE, CHILDREN_D, and CONNECTIONSPEED_D as your IVs. Please report and interpret your regression results, which should include the interpretation of the regression coefficients.
b) Please repeat the analysis in question a) using the Negative Binomial Regression model. Please report and interpret your regression results and coefficients.
c) Please summarize your observations by comparing the results from a) and b).
4. a) Please use the linear regression technique to answer the question on "what are the factors that influence how much time people spend on a website?" Please use DURATION as your DV; and you may decide on the IVs by conducting the similar exercises in Assignment #1. Please ONLY report and interpret your final regression results.
b) Please use the linear regression technique to answer the question on "what are the factors that influence how many pages people views when visiting a website?" Please use PAGES_VIEWED as your DV; and you may decide on the IVs by conducting the similar exercises in Assignment #1. Please ONLY report and interpret your final regression results.
c) Alternatively, you can also use count data model (Poisson or Negarive Binomial) since PAGES_VIEWED is a variable with discrete and non-negative integers. Using the similar set of IVs, do you see significantly different results by using linear regression vs. count data models?
d) Please summarize your observations by comparing the results from a), b), and c).
Attachment:- HotelClickStream.rar
Attachment:- Appendix.rar