This year our challenge will be based on the European Soccer Database (https://www.kaggle.com/hugomathien/soccer).
Judging criteria: Solutions will be judged by a panel of judges based on the responses to the two Challenge questions according to the following criteria.
• Originality and Innovation
• Methodology and Approach
• Objective addressed by the analysis
• Data discussion
• Insight(s) - Analytics and Quantitative results
• Summary of analysis
Challenge questions:
1 Predict the outcome of the game. The bookies use 3 classes (Home Win, Draw, Away Win). They get it right about 53% of the time. This is also what the creator of the database also achieves. Though it may sound high for such a random sport game, you've got to know that the home team wins about 46% of the time. So the base case (constantly predicting Home Win) has indeed 46% precision.
2 Explore and visualize features. With access to players and teams attributes, team formations and in-game events you should be able to produce some interesting insights :)
Environment:
The first choice is to use the kernels R environment provided by Kaggle. It has the data already loaded and you can use examples that other people have done. Just be sure to not claim their work as your own!