/football-data-analytics

Collection of tools and scripts for analysis and visualisation of football data.

Primary LanguagePythonApache License 2.0Apache-2.0

Football Data Analytics

This repository contains a collection of tools, scripts and projects that focus on analysis and visualisation of football data.

Contents

Table of Contents
  1. ➤ Project Description
  2. ➤ Folder Structure
  3. ➤ Workflow
  4. ➤ Projects

Project Description

This repository contains a collection of projects that aim to generate meaningful insight from football data. Python is used for extraction, processing, analysis and visualisation of event data, aggregated team data, market value data and more. The project is broken down into sub-projects, each of which aims to either perform a specific analysis, generate some specific insight, or introduce automation to football data analytics. Using the contents of this repository, a number of novel & informative visuals and text threads have been created and shared with the football data analytics community via Twitter (@JKDS).

Folder Structure

football-data-analytics
│
├── analysis_tools
│   ├── __init__.py
│   ├── get_football_data.py [not included in git repo]
│   ├── logos_and_badges.py
│   ├── pitch_zones.py
│   ├── statsbomb_custom_events.py
│   ├── statsbomb_data_engineering.py
│   ├── whoscored_custom_events.py
│   ├── whoscored_data_engineering.py
│   ├── wyscout_data_engineering.py   
│ 
├── data_directory
│   ├── misc_data
│   │   ├── images
│   │   │   ├── ___.png
│   │   ├── log_regression_xg_data.pbz2
│   │   ├── neural_net_xg_data.pbz2
│   │   ├── worldcup_2010_to_2018_distcovered.xlsx
│   ├── statsbomb_data [not included in git repo]
│   ├── transfermarkt_data
│   ├── whoscored_data [not included in git repo]
│   ├── wyscout_data
│
├── projects
│   ├── 01_worldcup_b2b_midfielders
│   │   ├── import_data_statsbomb.py
│   │   ├── worldcup_b2b_mids.py
│   ├── 02_transfermarkt_scrape_and_analyse
│   │   ├── championship_forward_value_analysis.py
│   │   ├── premierleague_forward_value_analysis.py
│   │   ├── scrape_data_transfermarkt.py
│   ├── 03_xg_model
│   │   ├── shot_xg_plot.py
│   │   ├── xg_log_regression_model.py
│   │   ├── xg_neural_network.py  
│   ├── 04_match_reports
│   │   ├── import_data_whoscored.py
│   │   ├── pass_report_ws.py
│   │   ├── shot_report_understat.py     
│   ├── 05_competition_reports
│   │   ├── import_data_whoscored.py
│   │   ├── top_defensive_actions.py
│   │   ├── top_penalty_takers.py
│   │   ├── top_progressive_passers.py
│ 
├── .gitignore 
│     
├── LICENSE 
│ 
├── README.md 

Workflow

As shown in the folder structure above, the project contains three key folders:

  • data_directory: Collection of raw football data used for projects.
  • analysis_tools: Python package containing modules that support football data import, processing, manipulation and visualisation.
  • projects: Series of sub-projects, that cover various elements of football data analytics. Also contains any template scripts used to import raw data from various football data APIs, websites or data services.

In general, each project follows a number of logical steps:

  1. Create a folder within the Projects area to store files associated with the project.
  2. Use analysis_tools package > get_football_data module [note this module is not available within the git repo] to import raw data from football data API, website or data service:
    • If imported dataset is large, save to data_directory area in compressed BZ2 format and create a new script for analysis.
    • If imported dataset is small, data import and analysis can be completed in the same script (without saving data).
  3. Within data analysis script import required modules from analysis_tools package.
  4. Pre-process and format data using data_engineering modules within analysis_tools package.
  5. Synthesise additional information using custom_events and pitch_zones modules within analysis_tools package.
  6. Create visuals and generate insight for end-consumer using visualisation and logos_and_badges modules within analysis_tools package.

Projects

Project table of contents:
    01 - World Cup 2018 Box to Box Midfielder Analysis
    02 - Transfermarkt Web-Scrape and Analyse
    03 - Expected Goals Modelling
    04 - Automated Match Reporting
    05 - Automated Competition Reporting

01 - World Cup 2018 Box to Box Midfielder Analysis

Summary: Use Statsbomb data to define the most effective box to box midfielders at the 2018 World Cup. Throughout the work a number of custom metrics are used to score central midfielders in ball winning, ball retention & creativity, and mobility. A good box to box midfielder is defined as a central midfielder that excels in each of these areas. Of key interest in this work is the use of convex hulls as a proxy for player mobility / distance covered. The work also includes the development of a number of appealing visuals, as shown below.

       

       

02 - Transfermarkt Web-Scrape and Analyse

Summary: Scrape team and player market value information from transfermarkt.co.uk. This work includes the development of a "scouting tool" that highlights players from a given league that have a favourable combination of Age and Goal Contribution per £m market value. The work also explores the use of statistical models to predict market value based on player performance, as well as identifies teams that under and over-performed (league position) based on squad value.

       

03 - Expected Goals Modelling

Summary: Implementation and testing of basic expected goals probabilistic models. This work includes development and comparison of a logistic regression expected goals model and a neural network expected goals model, each trained off over 40000 shots taken across Europe's 'big five' leagues during the 2017/2018 season. The models are used to calculated expected goals for specific players, clubs and leagues over a specified time period.

       

           

04 - Automated Match Reporting

Summary: Development of automated scripts to produce match reports immediately after a match has concluded. This work includes collection and processing of public-domain match event data, and the production of multiple visuals that together constitute informative and appealing match reports. Visuals currently include shot maps, inter-zone passflows, pass plots and offensive action convex hulls.

       

       

05 - Automated Competition Reporting

Summary: Development of automated scripts to produce competition reports and multi-match player evaluations at any point throughout a competition. This work includes collection and processing of public-domain match event data, and the production of multiple visuals that generate novel and meaningful insight at a team and player level. Visuals currently include an assessment of progressive passes, forward defensive actions and penalty placement.