https://www.kaggle.com/c/house-prices-advanced-regression-techniques
The competition consists in predicting house prices in Ames, IA. The data, which is described below, has been split into 50% train and 50% test sets at the above website (with 1460 and 1459 observations, respectively). The test set contains all the predictor variables found in the train set, but is missing the outcome variable, SalePrice. You will use the model you develop on the train set to make predictions for the test set and then submit your predictions at Kaggle. (You may make as many submissions as you like.) Your score will be based on the out-of-sample performance of your model. The competition tests your ability to develop a generalizable model with low variance.
Goal: Present 5 variable model in 1 page.
- The report should include error metrics, including estimated model performance out-of-sample (more on that later).
- Plan to submit to Kaggle. You need to include your Kaggle score/rankin the interim report.(For this one, just give me the document for submit to the Kaggle)
How to pick variables?
Learn the data.
Logically, given what you know of housing prices, which variables should be most predictive? (Location, location, location.) Explore the data for the predictors that are highly correlated with the outcome.
Length: no more than 1 page, single spaced, including graphs and tables. (Submit source code in aa separate document.)