This repository provides presentations and tutorials that demonstrate how to value on-the-ball actions in football.
The tutorials use the open-source socceraction Python library and the publicly available Wyscout match event dataset.
The dataset includes data for the 2017/2018 English Premier League, the 2017/2018 Spanish Primera División, the 2017/2018 German 1. Bundesliga, the 2017/2018 Italian Serie A, the 2017/2018 French Ligue 1, the 2018 FIFA World Cup and the UEFA Euro 2016. The dataset covers 1,941 matches, 3,251,294 events and 4,299 players.
The environment.yml file and requirements.txt file list the required Python dependencies. The notebooks are compatible with version 0.2.0
of the socceraction
Python library. If a more recent version of the library is installed, the code may need to be adapted.
Introduction in Friends of Tracking (video)
This introductory presentation, which was given in the Friends of Tracking session that took place on Thursday 7 May 2020, motivates the use of data for player recruitment in football, shows the limitations of traditional statistics to assess the performances of football players, introduces a number of frameworks for valuing actions of football players, provides an intuitive explanation of the VAEP framework for valuing actions of football players, and introduces the content of this series of hands-on video tutorials.
This presentation expands on the content of the introductory presentation by discussing the technicalities behind the VAEP framework for valuing actions of football players as well as the content of the hands-on video tutorials in more depth.
Tutorial 1: Run pipeline (video, notebook, notebook on Google Colab)
This tutorial demonstrates the entire pipeline of ingesting the raw Wyscout match event data to producing ratings for football players. This tutorial touches upon the following four topics: downloading and preprocessing the data, valuing game states, valuing actions and rating players.
Tutorial 2: Generate features (video, notebook, notebook on Google Colab)
This tutorial demonstrates the process of generating features and labels. This tutorial touches upon the following three topics: exploring the data in the SPADL representation, constructing features to represent actions and constructing features to represent game states.
Tutorial 3: Learn models (video, notebook, notebook on Google Colab)
This tutorial demonstrates the process of splitting the dataset into a training set and a test set, learning baseline models using conservative hyperparameters for the learning algorithm, optimizing the hyperparameters for the learning algorithm and learning the final models.
Tutorial 4: Analyze models and results (video, notebook, notebook on Google Colab)
This tutorial demonstrates the process of analyzing the importance of the features that are included in the trained machine learning models, analyzing the predictions for specific game states, and analyzing the resulting player ratings.
- Tom Decroos, Lotte Bransen, Jan Van Haaren, and Jesse Davis. Actions Speak Louder than Goals: Valuing Player Actions in Soccer. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1851-1861. 2019.
- Luca Pappalardo, Paolo Cintia, Alessio Rossi, Emanuele Massucco, Paolo Ferragina, Dino Pedreschi, and Fosca Giannotti. A Public Data Set of Spatio-Temporal Match Events in Soccer Competitions. Scientific Data 6, no. 1 (2019): 1-15.