WorldCupPrediction
Predicting results for the 2022 world cup.
Matches are predicted using a framework based on the team-level model in https://github.com/alan-turing-institute/AIrsenal, which in turn uses https://github.com/anguswilliams91/bpl-next. This model is trained on international mens football results obtained from https://github.com/martj42/international_results. The original model is a version of Dixon and Coles.
In the media
There is some background information and description of the results here.
The New Scientist also has an article, as does The Daily Mail, which prompted one reader to enthuse: "What an absolute load of rubbish, but typical of the anti-English establishment today!", and another to ponder: "I have NEVER KNOW a Prediction from a Scientist to be Right Ever, But on this Prediction I may be wrong.". High praise indeed!
Installation
The easiest way to use the code is via poetry. If you have poetry installed, from this directory, you can do
poetry shell
poetry install
to first open a shell in a virtual environment, and then install the dependencies and the wcpredictor
package.
Usage
Simulating a tournament multiple times
There are a couple of command-line applications that can be run when the wcpredictor
package is installed as described above.
In order to simulate the tournament N times, you can do
wcpred_run_simulations --num_simulations <N> --tournament_year <year> --training_data_start <YYYY-MM-DD> --training_data_end <YYYY-MM-DD> --output_csv <outputfilename> --use_ratings
and the results, in the form of a table of how many times each team got to each stage of the competition, will be saved in the specified csv file. At present, the allowed values for tournament_year
are "2014", "2018", and "2022" (the default).
Once you have a csv file saved from running that, you can plot the top ten most frequent winners by running:
wcpred_plot_winners --input_csv <inputfilename> --output_png <outputfilename>
and the results will be saved in the specified png.
You can also make a plot showing how far in the tournament a selection of teams got, by running e.g.:
wcpred_plot_progress --input_csv <inputfilename> --output_png <outputfilename> --team_list "England,Wales"
Note that both these commands can be run with --help
to see the options.
Running a single tournament
In a python session, you can do something like:
python
>>> from wcpredictor import Tournament, get_and_train_model
>>> t = Tournament("2022") # can also choose "2018" or "2014"
>>> model = get_and_train_model(start_date="2016-06-01", end_date="2022-11-20") # choose dates for training data
>>> t.play_group_stage(model)
>>> # at this stage, we can look at how each group is doing
>>> print(t.groups["A"])
Position | Team | Points | GS | GA
1st Netherlands 6 8 4
2nd Qatar 4 2 1
3rd Ecuador 4 1 5
4th Senegal 3 4 5
>>> # or, we can go ahead and play the knockout stages
>>> t.play_knockout_stages(model)
>>> print(t.winner)