This repo aims to educate participants of the Betfair Brownlow Datathon on creating an end-to-end model to use in the competition. Whilst some coding experience is required to be able to follow the modelling workthrough, there are no restrictions on the tools you can use to complete the task - even Microsoft Excel should be powerful enough to create a competitive model!
This repo will outline how the Betfair Data Scientists went about modelling the AFL Brownlow for the 2018 season. The task is: to provide a prediction for the number of Brownlow Votes each player will receive in the 2019 AFL season. We have included a dataset named 'brownlow_datathon_data.csv' which contains player level data from the 2010 to 2019 AFL seasons. You are not required to use this data, however if you choose to use your own dataset, please ensure that the player names in your submission match the ones that we provided.
The metric used to determine the winner will be the Mean Squared Error over all predictions, based on the actual Brownlow votes received compared to predicted Brownlow Votes redeived. For example, the Betfair Data Scientists' Brownlow Model predicted Tom Mitchell to poll votes last season, while he only actually polled 28. Thus, the Squared Error for this individual prediction would was . On the other hand, if our model had correctly predicted votes, the Squared Error would have been . The winner of the datathon will be the partcipant with the lowest Mean Squared Error over all predictions.
For a detailed outline of the task, the prizes, and to sign up, click here.
To read how we went about modelling the 2018 Brownlow, read this.
Place | Price |
---|---|
1 | $2000 |
2 | $1000 |
3 | $750 |
4 | $500 |
5 | $250 |
6-10 | $100 |
Total | $5100 |
To submit your model, email your final submission to datathon@betfair.com.au by 5.59pm AEST on 23 September 2019. Please rename the CSV to what you would like to call your model (this could be displayed on a leaderboard). Note that you don't need to email your code, just your predictions in the format that we have specified in the ‘Betfair Brownlow Datathon Submission.csv’. Any submissions that are not in the correct format will not be accepted.
Submissions should include a prediction for the number of votes each player will receive. We will assume a prediction of votes for any missing values and Mean Squared Error will be calculated accordingly. Please ensure player names are in the specified format, otherwise we will be unable to find you prediction and, once again, we will assume a prediction of 0 votes. Here is an example of what a submission would have looked like for a sample of 25 players in the 2018 season...
PLAYER | PREDICTION |
---|---|
T Mitchell | 35.484614 |
M Gawn | 21.544278 |
D Martin | 20.444488 |
B Grundy | 19.543511 |
C Oliver | 19.009628 |
J Macrae | 18.931594 |
P Dangerfield | 18.621242 |
D Beams | 17.621222 |
E Yeo | 16.015638 |
L Neale | 15.495083 |
A Gaff | 15.165629 |
D Heppell | 15.083797 |
J Selwood | 14.989096 |
S Sidebottom | 14.863136 |
N Fyfe | 14.692243 |
J Kennedy | 14.404489 |
Z Merrett | 13.632131 |
M Crouch | 13.503858 |
R Laird | 13.274869 |
P Cripps | 13.240568 |
G Ablett | 13.01895 |
L Franklin | 12.792476 |
J Lloyd | 12.174224 |
J Kelly | 11.982157 |
C Ward | 11.892443 |
And here is how this sample would have been scored...
PLAYER | PREDICTION | ACTUAL | SE |
---|---|---|---|
T Mitchell | 35.484614 | 28 | 56.01944673 |
M Gawn | 21.544278 | 20 | 2.384794541 |
D Martin | 20.444488 | 19 | 2.086545582 |
B Grundy | 19.543511 | 17 | 6.469448207 |
C Oliver | 19.009628 | 13 | 36.1156287 |
J Macrae | 18.931594 | 14 | 24.32061938 |
P Dangerfield | 18.621242 | 17 | 2.628425623 |
D Beams | 17.621222 | 18 | 0.143472773 |
E Yeo | 16.015638 | 15 | 1.031520547 |
L Neale | 15.495083 | 11 | 20.20577118 |
A Gaff | 15.165629 | 16 | 0.696174966 |
D Heppell | 15.083797 | 13 | 4.342209937 |
J Selwood | 14.989096 | 14 | 0.978310897 |
S Sidebottom | 14.863136 | 24 | 83.48228375 |
N Fyfe | 14.692243 | 16 | 1.710228371 |
J Kennedy | 14.404489 | 4 | 108.2533914 |
Z Merrett | 13.632131 | 10 | 13.1923756 |
M Crouch | 13.503858 | 8 | 30.29245288 |
R Laird | 13.274869 | 19 | 32.77712497 |
P Cripps | 13.240568 | 20 | 45.68992096 |
G Ablett | 13.01895 | 14 | 0.962459103 |
L Franklin | 12.792476 | 16 | 10.28821021 |
J Lloyd | 12.174224 | 6 | 38.121042 |
J Kelly | 11.982157 | 10 | 3.928946373 |
C Ward | 11.892443 | 13 | 1.226682508 |
- | - | - | - |
MSE | - | - | 21.09389949 |
If you are happy to read through the Python tutorials on Github, but not run the code yourself, you can click here. If you are keen to try and run the code yourself and try different things out, you will need to install the following:
Python Jupyter Notebook (Installed through the Anaconda Distribution) If you don't already have Python installed, we advise you to install it through Anaconda. This also installs Jupyter and is super convenient.
Note that whilst predictions are fun and rewarding to create, we can't promise that your betting strategy will be profitable. If implementing your own strategies please gamble responsibly and note that you are responsible for any winnings/losses incurred.
- some of the players are missing. i.e. there is only one listing of A Brayshaw, when there should be two Brayshaws.
Player names are in the format {firstname initial}{surname} as some small merging of datasets was required to create the dataset for this competition. This means some players with identical names are not uniquely identifed. Don't worry about providing a submission for these players; you can provide a prediction or no prediction at all and you will not be penalised as we will have to exclude these players from scoring. However, do not make changes to the submission file to try and distinguish between these players.