football-data-extraction
- Extracts football data from certain data sources
- The raw JSON data is pulled via the understat module created by Amos Bastian
- Extracts various kinds of football data from the top 5 leagues, starting from the 2014/15 season.
- The top 5 leagues are:
['EPL', 'Bundesliga', 'La Liga', 'Serie A', 'Ligue 1']
Usage
- You'll have to first install Python 3.6 (or higher) via any of these links:
- Make sure your environment variables are setup correctly - search for another resource on how to install Python, if necessary.
- Open command-line/terminal in
football-data-extraction
directory, and dopip install -r requirements.txt
to install all dependencies. If you're unfamiliar with command-line for Windows, check this out. - You might have to regenerate IDs of players/teams every season, by running any one of the following commands inside the
understat_wrangler
directory:python3 regenerate_ids.py
python regenerate_ids.py
py regenerate_ids.py
- Open the
user_inputs.csv
file in theunderstat_wrangler
directory, and feed in your inputs, regarding which data you'd like to extract. - You can then pull wrangled stats from understat by running any one of the following commands inside the
understat_wrangler
directory:python3 run.py
python run.py
py run.py
Code structure
- The source code is present in the
understat_wrangler
directory - The
extract.py
file is used to extract raw JSON data from the understat module. You can checkout the understat documentation as well. - The
transform.py
file is used to transform/wrangle the raw JSON data into human-readable Excel/CSV files. - The
pipeline.py
file is used to put together the code in the codebase, and store various Excel/CSV files, as desired.
Which data can be extracted?
- On this page by Amos Bastian, you can see which data is being extracted.
- I've displayed the information that I reckon is necessary from said page below