/football_analytics

πŸ“Šβš½ A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.

Primary LanguageJupyter Notebook

Edd Webster Football Analytics

A space for football analytics projects by Edd Webster, including a curated list of publicly available resources published by the football analytics community.


Edd Webster Analytics

Visitors trackgit-views GitHub Stars GitHub Last Commit GitHub Commit Activity GitHub Repository Size Licence Kofi Badge

-----------------------------------------------------

πŸ‘‹ About This Repository and Author

Edd Webster

The README of this repository is a concise resources guide of learning materials, data sources, libraries, papers, blogs, , etc., created by all those that have made contributions to the open source football analytics community. This GitHub repository and resources list is always a work in progress, with new resources added semi-regularly. If you feel there's any resource(s) that I've missed, please feel free to create a pull request or send me a message on the links above and I'll get back to you as quick as I can!

If you like the repo, please feel free to give it a ⭐ (top right). Cheers!

For more information about this repository and the author, see the following:

CV Badge Personal Website Badge Email Badge LinkedIn Badge Twitter Badge Mastadon Badge Linktree Badge GitHub Badge Tableau Badge Kofi Badge

-----------------------------------------------------

πŸ“– Table of Contents

Table of Contents
  1. About This Repository and Author
  2. Table of Contents
  3. Prerequisites
  4. Repository Structure
  5. Notebooks
  6. Data Visualisation and Tableau
  7. Resources
  8. Citations
  9. Contributing
  10. Star Tracker
  11. Acknowledgements

-----------------------------------------------------

🍴 Prerequisites

Python Badge Jupyter Badge

The only prerequisites for using this GitHub repo is that you have a computer, internet connection and the desire to learn more about football analytics.

The code in this GitHub repository is written in Python and uses the open-sourcelibraries listed below. Python, R, as well as most of these libraries can be obtained by downloading and installing Anaconda. Step-by-step guides to do this can be found for Windows here and Mac here, as well as in the Anaconda documentation itself here.

Back to Contents

-----------------------------------------------------

🌡 Repository Structure

The contents of this GitHub repository is organised as follows:

eddwebster/football_analytics
.
β”‚
β”œβ”€β”€ dashboards
β”‚
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ capology
β”‚   β”œβ”€β”€ elo
β”‚   β”œβ”€β”€ export
β”‚   β”œβ”€β”€ fbref
β”‚   β”œβ”€β”€ fifa
β”‚   β”œβ”€β”€ guardian
β”‚   β”œβ”€β”€ metrica-sports
β”‚   β”œβ”€β”€ opta
β”‚   β”œβ”€β”€ reference
β”‚   β”œβ”€β”€ sb
β”‚   β”œβ”€β”€ shots
β”‚   β”œβ”€β”€ stats-perform
β”‚   β”œβ”€β”€ stratabet
β”‚   β”œβ”€β”€ tm
β”‚   β”œβ”€β”€ touchline-analytics
β”‚   β”œβ”€β”€ twenty-first-group
β”‚   β”œβ”€β”€ understat
β”‚   └── wyscout
β”‚
β”œβ”€β”€ docs
β”‚   β”œβ”€β”€ centre-circle
β”‚   β”œβ”€β”€ metrica-sports
β”‚   β”œβ”€β”€ opta
β”‚   β”œβ”€β”€ sb
β”‚   β”œβ”€β”€ shots
β”‚   β”œβ”€β”€ stratabet
β”‚   └── wyscout
β”‚
β”œβ”€β”€ gif
β”‚   └── fig
β”‚
β”œβ”€β”€ img
β”‚   β”œβ”€β”€ club_badges
β”‚   β”œβ”€β”€ eddwebster
β”‚   β”œβ”€β”€ fig
β”‚   β”œβ”€β”€ logos
β”‚   β”œβ”€β”€ pitches
β”‚   └── vizpiration
β”‚
β”œβ”€β”€ notebooks
β”‚   β”‚    
β”‚   β”œβ”€β”€ 1_data_scraping
β”‚   β”‚   β”œβ”€β”€ Capology Player Salary Web Scraping.ipynb
β”‚   β”‚   β”œβ”€β”€ FBref Player Stats Web Scraping.ipynb
β”‚   β”‚   └── TransferMarkt Player Bio and Status Web Scraping.ipynb   
β”‚   β”‚
β”‚   β”œβ”€β”€ 2_data_parsing
β”‚   β”‚   β”œβ”€β”€ ELO Team Ratings Data Parsing.ipynb
β”‚   β”‚   β”œβ”€β”€ StatsBomb Data Parsing.ipynb
β”‚   β”‚   └── Wyscout Data Parsing.ipynb   
β”‚   β”‚
β”‚   β”œβ”€β”€ 3_data_engineering
β”‚   β”‚   β”œβ”€β”€ Capology Player Salary Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ Centre Circle Opta CPL Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ FBref Player Stats Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ Opta #mcfcanalytics PL 2011-2012.ipynb
β”‚   β”‚   β”œβ”€β”€ StatsBomb Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ StrataBet Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ The Guardian Player Recorded Transfer Fees Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ TransferMarkt Historical Market Value Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ TransferMarkt Player Bio and Status Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ TransferMarkt Player Recorded Transfer Fees Data Engineering.ipynb
β”‚   β”‚   β”œβ”€β”€ Understat Data Engineering.ipynb
β”‚   β”‚   └── Wyscout Data Engineering.ipynb
β”‚   β”‚
β”‚   β”œβ”€β”€ 4_data_unification
β”‚   β”‚   └── Unification of Aggregated Seasonal Football Datasets.ipynb
β”‚   β”‚
β”‚   β”œβ”€β”€ 5_data_analysis_and_projects
β”‚   β”‚   β”‚   
β”‚   β”‚   β”œβ”€β”€ player_similarity_and_clustering
β”‚   β”‚   β”‚   └── PCA and K-Means Clustering of 'PiquΓ©-like' Defenders.ipynb 
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€tracking_data
β”‚   β”‚   β”‚   β”œβ”€β”€ metrica_sports
β”‚   β”‚   β”‚   β”‚   └── Metrica Tracking Data EDA.ipynb
β”‚   β”‚   β”‚   β”‚   
β”‚   β”‚   β”‚   └── signality
β”‚   β”‚   β”‚       β”œβ”€β”€ Signality Tracking Data Engineering.ipynb
β”‚   β”‚   β”‚       └── Signality Tracking Data EDA.ipynb
β”‚   β”‚   β”‚ 
β”‚   β”‚   └──xg_modeling
β”‚   β”‚   β”‚   β”‚   
β”‚   β”‚   β”‚   β”œβ”€β”€ shots_dataset
β”‚   β”‚   β”‚   β”‚   β”‚   
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ chance_quality_modelling
β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 1) Logistic Regression Expected Goals Model.ipynb
β”‚   β”‚   β”‚   β”‚   β”‚   └── 2) XGBoost Expected Goals Model.ipynb
β”‚   β”‚   β”‚   β”‚   β”‚   
β”‚   β”‚   β”‚   β”‚   └── metrica-sports
β”‚   β”‚   β”‚   β”‚       └── Metrica Sports.ipynb
β”‚   β”‚   β”‚   β”‚   
β”‚   β”‚   β”‚   └── opta_dataset
β”‚   β”‚   β”‚       └── Training of an Expected Goals Model Using Opta Event Data.ipynb
β”‚   β”‚   β”‚ 
β”‚   └── 6_data_visualisation
β”‚
β”œβ”€β”€ research
β”‚   β”œβ”€β”€ papers
β”‚   └── slides
β”‚
β”œβ”€β”€ scripts
β”‚
β”œβ”€β”€ spreadsheets
β”‚
└── video 

Back to Contents

-----------------------------------------------------

πŸ“” Notebooks

The code in this repository is mostly written in Jupyter notebooks or Python scripts, organised in the following workflow:

  1. Webscraping
  2. Data Parsing
  3. Data Engineering
  4. Data Unification
  5. Data Analysis - projects include working with Tracking data, constructing VAEP models (as introduced by SciSports), building xG models using Logistic Regression, Random Forests and Gradient Booested Decision Tree algorithms such as XGBoost, and analysing player similarity using PCA and K-Means clustering.

Back to Contents

-----------------------------------------------------

πŸ“Š Data Visualisation and Tableau Dashboards

For Tableau dashboards produced using the data engineered in the notebooks in this repository, please see my Tableau Public profile: public.tableau.com/profile/edd.webster.

Example Tableau dashboards:

Back to Contents

-----------------------------------------------------

πŸ“‘ Resources

πŸ”– Other Football Analytics Resources Guides

Credit to the following resources that were all used to plug gaps in this resources guide once it was published:

Back to Contents

-----------------------------------------------------

πŸƒ Getting Started with Football Analytics

Good resources for those new for the use of data in football:

Back to Contents

-----------------------------------------------------

πŸ’Ύ Data

ℹ️ Data Sources

Publicly available data sources and datasets relating to football, from Tracking data, Event data, aggregated player performance data, detailed match statistics, injury records and transfer values, and more.

Data sources that have been used in the code and analysis in this repository can be found in the data subfolder of this repository or in Google Drive (due to GitHub's 100mb file limit) [link]. All code however in this repository should enable you to scrape, parse, and engineer the datasets as per the output used for analysis and visualisations featured.

To learn more about the different types of data available, such as Event and Tracking data, see the "Where can I get data?" section of Devin Pleuler's soccer_analytics_handbook [link].

For a quick primer of the free football data resources available, see the following Twitter thread by James Nalton [link].


Event data

Event Data is labelled data for each on-the-ball event that takes place during a game. The data is manually collected from television footage. To learn more about the data collection, see the following video [link].

Each match of event data has around 2-3 thousand individual events (rows), depending on the provider.

The main providers of this data are StatsBomb, Stats Perform (formally Opta), and Wyscout.

Name Comments Source / method(s) to get the data
StatsBomb Open Data StatsBomb Open Data GitHub Repo
StrataData by StrataBet Chance shooting data provided No longer made available (since 2018), however, it can be found in GitHub repos of old analysis (including this one) [link].
Soccer Video and Player Position Dataset Dataset of elite soccer player movements and corresponding videos, made available by the University of Oslo. See the accompanying paper [link] [Link] (appears to no longer be working)
Opta Event data for 20+ leagues including the 'Big 5' European leagues, some of which go back to the 09/10 season, Data available through scraping WhoScored? Match Centre through the following methods:
Opta (11/12 sample dataset) Match-by-match aggregated player performance data for the 11/12 season and F24 Event data for a 11/12 match of Manchester City vs. Bolton Wanders as part of the #mcfcanalytics initiative No longer made available (since 2012), however, it can be found in GitHub repos of old analysis (including this one).
Understat Shooting and meta data including xG values for the 'Big 5' European leagues and Russian Premier League This data can be accessed through the following:
Wyscout Event data for the 17/18 season for the 'Big 5' European leagues, Euro 2016 Chanpionship, and 2018 World Cup made available by Luca Pappalardo, Alessio Rossi, and Paolo Cintia. See their paper A public data set of spatio-temporal match events in soccer competitions. Figshare

Tracking data

Tracking Data records the x and y coordinates of every player on the field, as well as the ball, a number of times per second (usually 10-25). For this reason, the dataset is quite large, much larger than event data at around 2-3 million rows per game.

The data is collected by cameras installed in a stadium and is therefore not widely available, with teams usually only having access to the data in their own league.

The main providers of this data are Second Spectrum, STATS Perform, Metrica Sports, and Signality.

Name Comments Source / method(s) to get the data
Last Row Tracking-like data by Ricardo Tavares Tracking-like data collected by Ricardo Tavares. See the Liverpool Analytics Challenge for which this data was used (winners discussed on Friends of Tracking [link]). GitHub repo
Metrica Sports Sample Tracking and corresponding Event data Three sample matches of synced event and tracking data. For code to work with this data including Pitch Control modellng, see the LaurieOnTracking GitHub repo by Laurie Shaw and the corresponding Friends of Tracking tutorials. GitHub repo
Signality Tracking data Three matches of tracking data from the Allsvenskan - Hammarby vs. IF Elfsborg (22/07/2019), Hammarby 5 vs. 1 Örebrö (30/09/2019), and Hammarby vs. Malmö FF (20/10/2019). This data was made available as part of the 2020 Mathematical Modelling of Football course. The password to download the data is not publicly available, but can be found in the Uppsala Mathematical Modelling of Football Slack group [link]. For access, contact Novosom Salvador Twitter and rsalvadords@gmail.com, or feel free to contact myself. Note, that the 2nd half of the Hammarby-Γ–rebro match is incomplete.

Broadcast Tracking data

Broadcast Tracking is collected from broadcast footage using computer vision techniques. Unlike in-stadium tracking data, the dataset is not complete and missing players out of shot of the broadcast footage. However, the great benefit is that the data collected is much cheaper and the coverage for what leagues are available is much greater which is extremely useful for tasks such as recruitment analysis.

The main providers of this data are SkillCorner and Sportlogiq.

Name Comments Source / method(s) to get the data
SkillCorner broadcast Tracking data 9 matches of broadcast tracking data, including matches from 2019/2020 for the league champions and runners up in English Premier League, French L1, Spanish LaLiga, Italian Serie A and German Bundesliga. To find out more about broadcast tracking data and its use cases, see the following Medium article [link]. GitHub repo

Aggregated Player/Team Performance data
Name Comments Source / method(s) to get the data
DAVIES modelling data Estimated player evaluation data by Sam Goldberg and Mike Imburgio for American Soccer Analysis. To learn more about DAVIES, see the following blog post [link]. Shiny App
FBref season-on-season aggregated player performance data provided by StatsPerform. Aggregated player performance data for the following competitions:
  • Men's competitions
    • English Premier League
    • Spanish La Liga
    • German Bundesliga
    • French Ligue 1
    • Italian Serie A
    • Dutch Eredivisie
    • Portuguese Primeira Liga
    • Brazilian Serie A
    • Mexican Liga MX
    • MLS
    • English Championship
    • Champions League
    • Europa League
    • Conmebol Copa Libertadores
    • World Cup
    • Euros
    • Copa America
  • Women's competitions
    • American NWSL
    • English Super League
    • Australian A-League
    • French Division 1 Feminine
    • German Frauen-Bundesliga
    • Italian Serie A
    • Spanish Liga F
    • Women's Champions League
    • World Cup
    • Euros
Note: there was a change in the data provider used by FBref for their statistics in October 2022, from StatsBomb to StatsPerform. Therefore, the following scraping code is split into current working solutions and archived solutions: Additional data sources:
  • Every FBref metric for every 2020-21 Big 5 European league player by Ronan, see [link], [link] and [Tweet]. A 'tidied' version have been made by goaltergeist, see [link]
  • 2,823 players in Europe's top 5 leagues on FBref, with their positions as listed on Transfermarkt by Rahul Iyer, see [link] and [Tweet]
Stats Perform and Centre Circle Canadian Premiere League data Aggregated player performance data Google Drive

Team Rating data
Name Comments Source / method(s) to get the data
Elo club rankings Elo ratings for club football based on past results to allow for estimation of each club's strength, allowing predictions for the future. Data available through:
Euro Club Index Ranking of the football teams in the highest division of all European countries, that shows their relative playing strengths at a given point in time, and the development of playing strengths in time. To see more about the methodology used to calculate these rankings, see the following page [link] Link
FiveThirtyEight Club Ranking Global Club Soccer Rankings. How 637 international club teams compare by Soccer Power Index Data available through:
Opta Power Rankings Opta Power Rankings Data available through:
UEFA Club Coefficients UEFA club coefficient rankings based on the results of all European clubs in UEFA club competition. Data available through:
World Football / Soccer Clubs Ranking Club ranking website Link

Physical data
Name Comments Source / method(s) to get the data
Bundesliga physical data Bundesliga player stats, powered by AWS Link (not scraped into a CSV)

Results and Match Sheet data
Name Comments Source / method(s) to get the data
2018 FIFA World Cup Rosters Goals, caps, club, and date of birth for players on 2018 FIFA World Cup rosters. Source: data.world Excel
engsoccerdata English and European soccer results 1871-2017 GitHub repo
FIFA World Cup Match Results Matchups and results of FIFA World Cup matches from 1930 - 2014. Source: data.world Excel
FotMob Dataset including team and play stats including xG and post-shot xG. This data can be scraped using:
Football Lineups A database of teams tactics and formations crowdsourced by the users. Link
international_results Repository of results of 44,353 results of international football matches starting from the very first official match in 1872 up to 2022. GitHub repo
smarterscout Scouting and player rating information platform for evaluating the performance of football players around the world. The platform was developed by Dan Altman at North Yard Analytics to assess players' contributions to winning, their playing style, and their skill level. Note: this is a subscription service. Link
SofaScore Live scores, lineups, standings, heatmaps, and basic teams, coaches and player data Link
Soccerway Match sheet data Link

Financial, Valuation, and Transfer data
Name Comments Source / method(s) to get the data
Capology Player salaries See the Capology Player Salary Web Scraping notebook for Python code to scrape Capology data or access saved CSV files in data subfolder
KPMG Football Benchmark player valuation data
The Price of Football Master Spreadsheet data from the finance/business aspect of football by Kieran Maguire Link
spotrac Player contracts, salaries, and transfer information for the Premier League, MLS, and NWSL
TransferMarket Player bio, contractual, and estimated value data This data can be accessed through the following:
Guardian Player Transfer data Collated by Tom Worville (see Tweet [link]) GitHub

Odds, Betting, and Predictions data
Name Comments Source / method(s) to get the data
BetExplorer odds data Link
FiveThirtyEight Soccer Predictions database football prediction data Link
Football-Data.co.uk free bets and football betting, historical football results and a betting odds archive, live scores, odds comparison, betting advice and betting articles Link
International football results from 1872 to 2020 an up-to-date dataset of over 40,000 international football results by Mart JΓΌrisoo Link

Plotting Tools

See Mark Wilkin's Twitter thread for more about how to plot your own event data [link]:


Reference data
Name Comments Source / method(s) to get the data
xT grid League-wide Expected Threat (xT) values from the 2017-18 Premier League season (12x8 grid) determined by Karun Singh. For more information about about xT, see Karun's blog post [link] Link
EPV grid Grid of Expected Possession Values determined by Laurie Shaw. See the following lecture for more information [link] Link
Zones of a pitch Breakdown of a pitch into zones, for use with visualisation.Created by Rob Carroll Link

Miscellaneous Data
Name Comments Source / method(s) to get the data
awesome-football ⭐ by football.db (Gerald Bauer) A collection of awesome football (national teams, clubs, match schedules, players, stadiums, etc.) datasets GitHub repo
Data Hub Football data Link
European Soccer Database 25k+ matches, players & teams attributes for European Professional Football Link
FIFA 15-22 player rating data Scraped from SoFIFA by Stefano Leone Link
FIFA 18 Player Ratings 17k+ players, 70+ attributes extracted from FIFA 18, provided by sofifa Link
FootballData "A hodgepodge of JSON and CSV Football data" GitHub
footballcsv Historical soccer results in CSV format Link
football.db A free and open public domain football database & schema for use in any (programming) language (e.g. uses plain datasets) Link
Football xG Link
Guide to Football/Soccer data and APIs by Joe Kampschmid Link
My Football Facts Link
Physio Room Link
PlusMinusData play by play data from espn.com Link
Rec.Sport.Soccer Statistics Foundation Historical league tables and football results Link
RoboCup Soccer Simulator RoboCup Soccer Simulator Data Link
Squawka Link
Stat Bunker Link
Tableau data resources including sports data Link
Transfer League Link
Twelve Football Link
wosostats Women's soccer data from around the world Link

πŸ“„ Documentation

All documentation saved locally in the documentation subfolder, including:


Data Types and Companies

Data Providers
Tracking
Videos / Performances Analysis
Consultancy / Service Providers

Back to Contents

-----------------------------------------------------

πŸ§‘β€πŸŽ“ Tutorials

Python

R

Tableau

Check out the Tableau for Sports Discord server organised by Ninad Barbadikar, to interact with a community of Tableau developers

For a YouTube playlist of Tableau-football videos and tutorials that I have collated from various sources including the Tableau Football User Group, Rob Carroll, Tom Goodall, and Ninad Barbadikar, see the following [link].

PowerBI

For a YouTube playlist of Power BI-football videos and tutorials that I have collated from various sources including Futbol AnalysR and PowerBI for Sports, see the following [link].

SQL

Excel

PowerPoint

Back to Contents

-----------------------------------------------------

πŸ›οΈ Libaries

GitHub libraries that are considered to be 'Top rated' are those with 50 or more stars (at the time of writing) and have been indicated with a star emoji (⭐).

For a full list of Football Analytics GitHub repositories and libraries, see the following list on GitHub [link].

Python

R

Back to Contents

-----------------------------------------------------

GitHub Repositories

The following GitHub repositories are either repos that I have found and recommend or are publicly available analytics work in the subject of football with at least 5 stars on GitHub (at the time of writing).

GitHub repositories that are considered to be 'Top Rated' are those with 50 or more stars (again, at the time of writing) and have been indicated with a star emoji (⭐).

For a full list of Football Analytics GitHub repositories and libraries, see the following list on GitHub [link].

Python

R

Other Languages

No Language Specified

Back to Contents

-----------------------------------------------------

Apps

Back to Contents

-----------------------------------------------------

πŸ“Š Data Visualisation Resources and Tools

Resources to aid data visualisation:

Vizpiration

Check out the vizpiration subfolder in the img folder, for examples of visualisations created by analysts in the community.

Tutorials

Repos and libraries

Resources

Tweets

Back to Contents

-----------------------------------------------------

βœ’οΈ Written Pieces

Blogs

Many of these blog posts are recommended in Sam Gregory's Best Football Analytics Pieces piece and Tom Worville's β€œWhat’s the best Football Analytics piece you’ve ever read?”, both articles now a few years old. This section is very subjective so if I've missed anything obvious, apologies.

Blogs and Data Analytics Websites

The following list contains those blogs that are still maintained, as well as the original blogs from the OGs of football analytics.

For a Twitter thread of the football analytics blogs from 2009 an earlier, see the following Twitter thread from Tiotal Football [link].

πŸ“ƒ Papers

See the following subfolder of this GitHub repo for PDF copies of the papers listed below [link].

Many of the papers included in this list have been included after reading Jan Van Haaren's Jan Van Haaren's Soccer Analytics Reviews (2020, 2021, 2022). All credit to him for reading a paper a week and making his reviews publicly available and give his reviews a read through if you haven't already done so!

The following Shiny App from Lars Maurath is a great tool for looking up publications [link].

See the following webpages of conference papers per year (where available):

2022
2021
2020
2019
2018
2017
2016
2015
2014
2011
2002
1997
1971

Newsletters

News Articles

πŸ“š Books

The list of books below include are not only for football but for sports analytics in general.

See the following reading lists for book recommendations from other sports data scientists:

The following use Amazon UK links where available and are not affiliate links.

Magazines

Back to Contents

-----------------------------------------------------

πŸ“Ό Video

YouTube Playlists

Custom Playlists Curated by Myself

The following is a series of playlists that that I have collated originally for my own personal viewing but they may be useful to you:

Public Playlists

Playlists created by others

YouTube Channels

Video Analysis

Webinars and Lectures

Ted Talks

Documentaries

Match Highlights

Other

Back to Contents

-----------------------------------------------------

πŸ”Š Podcasts

Below I've tried to include both the Sports/Football Analytics and then notable episodes of all podcasts that have analytical content/interviews. Spotify and YouTube links used where available. All episodes mentioned below that are available on Spotify can be found in the following playlist (updated periodically): [link].

Football Analytics Podcasts

Notable Episodes (including non-football-data-specific podcasts)

Back to Contents

-----------------------------------------------------

πŸ‘¨β€πŸ’» Notable Figures and Twitter Accounts

Back to Contents

-----------------------------------------------------

πŸ—“οΈ Events and Conferences

Back to Contents

-----------------------------------------------------

Competitions

The following includes non-football competitions.

Back to Contents

-----------------------------------------------------

Courses

Back to Contents

-----------------------------------------------------

πŸ’Ό Jobs

For live job postings tracked by the community, check the Jobs channel of the Football in Numbers Discord server.

Clubs

The list of clubs is quite UK-centric. I would like to add more clubs but it takes a bit of time.

Premier League
Championship
League One

League Two

Scottish Premier League

Analytics Companies and Consultancies

Associations and Organisations

Betting Companiess

Media

Job Boards

Other Website Lists

Back to Contents

-----------------------------------------------------

Discord/Slack groups

Back to Contents

-----------------------------------------------------

πŸ”‘ Key Concepts

Focus on some of the key topics in football analytics. Most of the following resources features above but are instead reorganised by topic. This section is still very much a work in progress as I go along and may be missing resources mentioned above.

History of Football Analytics

Expected Goals (xG) Modeling

Videos

For a playlist of Expected Goals related videos available on YouTube, see the following playlist I have created [link].

Webinars and Lectures
Tutorials
Notable Models
Written Pieces

For a collated list of Expected Goals literature collated by Keith Lyons, see the following [link]

Libraries
GitHub Repositories
Podcasts
Tweets

Web Scraping Football Data

Written Pieces
Videos
Libraries

Tracking Data

Pitch Control Modeling

Tutorials

Pitch Control modelling and Valuing Actions tutorials by Laurie Shaw as part of his Metrica Sports Tracking data series for Friends of Tracking. See the following for code [link]

GitHub Repositories
Written Pieces
Video
Podcasts

Passing Networks

Written Pieces
Blogs
Papers
Tutorials
Videos
Tweets

Possession Value (PV) Frameworks

General
Expected Threat (xT)
Valuing Actions by Estimating Probabilities (VAEP)
Goals Added (g+)
On-Ball Value (OBV)

Dixon Coles Modeling

Player Similarity and Style Analysis

Written Pieces
Videos
Tutorials
GitHub Repositories

Reinforcement Learning for Football Simulation

Player Rating Modelling

Written Pieces
Podcasts
Github Repos
Companies
  • Traits Insights

Team Playing Style Analysis

Written Pieces
Papers
Blogs
Videos
GitHub Repositories

Set Pieces

Section created after seeing the following tweets and threads by Ashwin Raman ([link]) and Stuart Reid ([link])

Radars

Recruitment Analysis

Quantifying Relative Club and League Strength

Models
Financial
Historical Match Results
Historical Statistical Player Performance
Articles
Papers
Videos
Data
Miscellaneous
  • Tweets by AI Abucus [link] and [link]. They use a simple Dickson-Coles method focusing on historic results going back 15 years to build an order of hierarchy amongst teams in leagues that might have never played each other.

Tactics

Counter Attacking
Articles
Papers
Videos
Podcasts
Pressing
Articles
Videos
Counter Pressing
Articles
Papers
Videos

Player Valuation Modeling

Example Models
Example Methodologies
Written Pieces Regarding the Topic of Player Valuation
Articles
Blogs
Papers
Code/Notebooks
Slides
Tweets
Financial Data
Player Values
Recorded Transfers
Other
Relevant Packages/Repos
Miscellaneous

Game Win Probability Modeling

Goalkeeper Analysis

Back to Contents

-----------------------------------------------------

Citations

Thanks to all those that have kindly wrote about or promoted this GitHub repository. See:

Back to Contents

-----------------------------------------------------

Contributing

This GitHub repository and resources list is always a work in progress, with new resources added semi-regularly. If you feel there's any resource(s) that I've missed, I'm always open to contributions! Please feel free to create a pull request or send me a message @ edd.j.webster@gmail.com or @eddwebster and I'll get back to you as quick as I can!

If you're new to creating a pull request, please follow these steps (based on this)

  1. Create an account on GitHub if you do not already have one.

  2. Fork the project repository: click on the β€˜Fork’ button near the top of the page. This creates a copy of the code under your account on the GitHub user account. For more details on how to fork a repository see this guide.

  3. Clone your fork of the football_analytics repo from your GitHub account to your local disk:

    git clone https://github.com/<github username>/football_analytics.git
    cd football_analytics
  4. Create environment with:
    $ python3 -m venv my_env or $ python -m venv my_env or with conda:
    $ conda create -n my_env python=3

  5. Activate the environment:
    $ source my_env/bin/activate
    or with conda:
    $ conda activate my_env

  6. Add the upstream remote. This saves a reference to the main hyperopt repository, which you can use to keep your repository synchronised with the latest changes:

    $ git remote add upstream https://github.com/eddwebster/footbal_analytics.git

    You should now have a copy of the football analytics repository, and your git repository properly configured. The next steps now describe the process of modifying code and submitting a pull request:

  7. Synchronize your master branch with the upstream master branch:

    git checkout master
    git pull upstream master
  8. Create a feature branch to hold your development changes:

    $ git checkout -b my_change

    and start making changes. Always use a feature branch. It’s good practice to never work on the master branch!

  9. Then, once you commit ensure that git hooks are activated (Pycharm for example has the option to omit them). This can be done using pre-commit, as follows:

    pre-commit install
  10. Develop the feature on your feature branch on your computer, using Git to do the version control. When you’re done editing, add changed files using git add and then git commit:

    git add modified_files
    git commit -m "my first football_analyitcs commit"
  11. Record your changes in Git, then push the changes to your GitHub account with:

    git push -u origin my_change

Back to Contents

-----------------------------------------------------

Star History

Star history for the football_analytics repository.

Football Analytics GitHub Stars History

Back to Contents

-----------------------------------------------------

Acknowledgements

Back to the Top