COURSE PROJECT - Modeling the Starbucks Effect on Sales Prices of Condominiums in Downtown Toronto
Starbucks brand now has much impact on many products including Yogurt. In this project, you are asked to collect the real estate data and model your data using regression techniques to study if there is a Starbucks effect. Basically if there is an effect, it is expected that real estate is more competitive in the market and appear to be sold for higher price.
1. Modeling problem
The purpose of this project is to examine the relationship between the mean list price of a condo and the following property features:
a) Floor square feet (numerical, square feet, take the midpoint if data is given as a range).
b) Number of bedrooms (discrete numerical).
c) Number of washrooms (numerical).
d) Close to Subway or not (categorical, coded as Y or N).
e) Parking included or not (categorical, coded as Y or N).
f) Distance to nearest Starbucks is within 0.5 km or not (categorical, coded as Y or N).
The objectives of the study are:
I. To determine whether all the features listed above are statistically significant in terms of contribution to a first order regression model. If not, which of the variables are important to determine the list price?
II. To see if there is a Starbucks effect on list price. This can be achieved by testing the significance of the variable in f) in the given model or conducting a F test for nested models.
III. To see if there are interactions between the variables in f) and other variables. If yes, are they all statistically significant?
2. The data
You can go to the following webpage: www.mls.ca to collect the data or other similar real estate websites. Randomly and uniformly select 60 properties from the real estate listings downtown Toronto. 50% of them should be within 0.5 km to the nearest Starbucks Store location. The data variables mentioned above must be collected. To better organize the data, you may create two excel files (save as .csv file if you plan to use R to implement statistical analysis)
Prices
|
FloorSize
|
NumBeds
|
NumWashroom
|
ParkingStatus
|
NearestStarBucksWithinHalfKm
|
|
|
|
|
|
|
One of the data files used for building models (this data file should contain 55 data points) and another data file (contains the remaining 5 observations) is reserved as a test data set and will be used for input data when conducting a prediction.
Note that, when organizing data for SPSS software, you need to code a categorical variable as a set of dummy variables. If you decide to implement the results using R, you don't need to do so. Also often a range gives the size of the lot or floor, if this is the case, take the median value of that range.
3. The hypothesized Regression models (You may refer to the Case Study 2 in the textbook for how to specify the models)
I. Model 1: first order model, no effect from Starbucks store.
II. Model 2: first order model, constant effect Starbucks store.
III. Model 3: first order interaction model (i.e. interactions between Starbucks Variable and other variables)
4. Your tasks
a) Includes all regression models in the method section and explain the models in details.
b) For each model:
I. Implement regression analysis using either SPSS or R
II. Summarize the estimate of coefficients, standard error of the estimate, ?? statistics, p-value and confidence interval of the estimate.
III. Comment on the significance of each variable in the model using the results in b).
IV. Comment on the overall model utility.
c) Compare the models I and II using the F-test for the nested model to determine if a more complicate model (i.e. model 2) is necessary.
d) Compare the models II and III using the F-test for the nested model to determine if a more complicate model (i.e. model 3) is necessary.
e) Make prediction using the data that is not used for building the model for all three models and comment on their model accuracy.
5. Report
a) Write a report based on the analysis that you have done and the results you have obtained.
b) You must include all the data and the output of statistical analysis as an appendix.
The basic structure of your report should include the following section:
- Abstract: give a summary of the work including the problem, modeling approach and main findings
- Introduction: describe the major modeling problem and potential application of the model. Discuss the approaches being used.
- Data Collection: summarize that how you collect data; discuss data limitation, such as randomness, completeness. Data splitting for modeling and validating.
- Modeling Methods: discuss the regression techniques being used. Specify all details for the variables in the model, including model assumptions. Model validation problem.
- Results: Discuss method for estimating unknown parameters. Software used. Results obtained and all analysis you have done. Discuss model prediction accuracy and model selection.
- Conclusions: report on major findings.