Using FB Ref player data to measure player roles/types and identify similar players within positions, using clustering and nearest neighbors algorithms.
This project is managed in a virtual environment, using pipenv. All packages and their dependencies can be found in Pipfile and Pipfile.lock. To create a pipenv environment and install all the packages needed to run the code in this repository, run the following in a terminal:
# install pipenv
pip install pipenv
# navigate to the repository directory
cd ~/path/to/player-similarity-clusters
# install virtual environment and dependencies
pipenv install
The packages required are:
- pandas
- ipykernel
- matplotlib
- yellowbrick
- scikit-learn
There are two notebooks containing the code for the project. They have to be run sequentially for both to work, so the clustering models (contained in the aptly named clustering notebook) have to be computed first, before running the nearest neighbors algorithm (in the similarities notebook) to compute player similarities.
This project is still in development.
- Consider lasso & weighted k-means feature selection
- Look at clustering for defenders & goalkeepers
- Think about features needed for goalkeepers
- Stop the output pulling the target player when identifying similar players
The data for this project is provided by FB Ref and the code used to train the clustering and the nearest neighbors algorithms is licensed under the MIT license.
If you have any questions or comments, feel free to contact me by email, on Twitter, or in the repository discussions.