A toy project to see how predictable I'm in my so called GitHub contributions ;)
- Gather contributions data
- Train a machine learning model
- Use the model to predict futures contributions (published here)
- Repeat
3.
every day by using GitHub actions
- anaconda
- pytorch (with or without GPU)
- any additional pip requirements are listed in
requirements.txt
To allow one to build his own model the project is organized in multiple ordered python/jupyter files designed to be ran sequentially.
Download and save users' contributions and other stats provided by github public api.
User list is collected by randomly walking the users' following/followers graph.
Produce a big contribs.json
files containing raw users data.
This script can be ran again to gather even more data.
Parse and pack gathered data into numpy ndarrays.
Produce a compressed userdata.npz
numpy file
Pre-process users' contributions by using the following scheme:
- data augmentations using
mean
,std
,skewness
andfft
- outliers removal using quantiles filters mainly
- features normalization using scikit-learn preprocessing tools
Produce a compressed ml.npz
numpy file and a scalers.pkl.z
containing pickled scalers.
Jupyter notebook (designed to be ran on kaggle) for training a pytorch model.
Use previous pytorch model, download latest users' data and predict their contributions number for the next 7 days.
Produce csv
files containing predictions.