BieLeMetrics is a Python-based project aimed at processing handball data from Kinexon and Sportradar sources. The project includes downloading, synchronizing, and extracting features from event data to train machine learning models, such as an expected goal model, using the MLJAR platform.
This repository contains the official code implementation for the paper "Expected Goals Prediction in Professional Handball using Synchronized Event and Positional Data" by the original authors.
The goal of BieLeMetrics is to provide a seamless and automated pipeline to:
- Download data from Sportradar and Kinexon.
- Process the data by synchronizing event information between sources.
- Extract features to be used for machine learning tasks, such as training an expected goal model using MLJAR.
- Download game data from Sportradar and Kinexon sources.
- Synchronized data processing to align events across different sources.
- Feature extraction and CSV output for MLJAR-based model training.
- Parallel processing capabilities for efficient data handling.
- Python 3.7+
- Conda or virtualenv
- Required Python packages: See
requirements.txt
- Git submodules (to initialize external libraries)
-
Clone the repository:
git clone https://github.com/yourusername/BieLeMetrics.git cd BieLeMetrics
-
Initialize submodules:
git submodule init git submodule update
-
Create and activate the Python environment:
conda create -n bielemetrics python=3.12 conda activate bielemetrics
-
Install the dependencies:
pip install -r requirements.txt
-
Configure environment variables:
- Create a
.env
file in the root directory and set up necessary environment variables for your Kinexon and Sportradar API keys. Behold, the needed variables are (I must emphasize that I do not have any influence on this login procedure):
# Kinexon Session Endpoint ENDPOINT_KINEXON_SESSION="" # Kinexon Main Endpoint ENDPOINT_KINEXON_MAIN="" # Kinexon API Endpoint ENDPOINT_KINEXON_API="" # Kinexon Session Username USERNAME_KINEXON_SESSION="" # Kinexon Main Username USERNAME_KINEXON_MAIN="" # Kinexon Session Password PASSWORD_KINEXON_SESSION="" # Kinexon Main Password PASSWORD_KINEXON_MAIN="" # Kinexon API Key API_KEY_KINEXON="" # Sportradar API Key API_KEY_SPORTRADAR="" # Nextcloud Storage Endpoint ENDPOINT_STORAGE_NEXTCLOUD="" # Nextcloud Storage Username (optional) USERNAME_STORAGE_NEXTCLOUD="" # Nextcloud Storage Password (optional) PASSWORD_STORAGE_NEXTCLOUD="" # Path inside Nextcloud for storage (optional) PATH_STORAGE_IN_NEXTCLOUD=""
- Create a
The project is structured into several folders:
├── assets
├── data
│ ├── events # Event data from Sportradar and Kinexon
│ ├── ml_stuff # Machine learning-related files and outputs
│ ├── processed # Processed data ready for feature extraction
│ └── raw # Raw data from sources
└── src
├── helper_download # Scripts for downloading data
├── helper_ml # Machine learning helper functions
├── helper_preprocessing # Preprocessing scripts for feature extraction
├── utils # Utility scripts
└── libs_external # External libraries used
You can download game data for specific game IDs using:
python src/download_game_by_id.py <game_id>
To download games for an entire game day in parallel:
python src/download_gamedays.py
After downloading, process the data by synchronizing and extracting features using:
python src/process_game.py <sportradar_path> <kinexon_path>
Or to process multiple game days in parallel:
python src/process_gamedays.py
Processed game data will be saved in the data/processed/
directory, where features are extracted into CSV files for training in MLJAR.
Once the feature extraction is completed, the resulting CSV files can be fed into MLJAR to train an expected goals model.
If you'd like to contribute to BieLeMetrics:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/my-feature
). - Commit your changes (
git commit -am 'Add my feature'
). - Push to the branch (
git push origin feature/my-feature
). - Create a new Pull Request.
This project is licensed under the MIT License.