A toy project to see how predictable I'm with my so-called GitHub contributions ;)
One of the main goal of this repo is to predict current/next day contributions of multiples users in a daily automated way using GitHub actions.
To do so this project feature a pytorch model trained with contributions data from GitHub users.
The history of those predictions is available in the pred_history_no_scaling
branch
Please first consider that this project is just for fun, not well tested and intended for an harmless use.
To add a user for the next predictions, do the following:
- fork this repository
- append your GitHub nickname into the
users.txt
file - commit
- open a pull request
The following readme parts are now more technical.
Here's an overview of the process to predict contributions from zero:
- Gather contributions data
- Train a machine learning model
- Use the model to predict futures contributions (published here)
- Repeat
3.
every day by using GitHub actions
- anaconda
- pytorch (with or without GPU)
- any additional pip requirements are listed in
requirements.txt
To allow one to build his own model the project is organized in multiple ordered python/jupyter files designed to be ran sequentially.
Download and save users' contributions and other stats provided by GitHub public api.
User list is collected by randomly walking the users' following/followers graph.
Produce a big contribs.json
files containing raw users' data.
This script can be run again to gather even more data.
Parse and pack gathered data into numpy ndarrays.
Produce a compressed userdata.npz
numpy file
Pre-process users' contributions by using the following scheme:
- data augmentations using
mean
,std
,skewness
andfft
- outliers removal using quantiles filters mainly
- features normalization using scikit-learn preprocessing tools
Produce a compressed ml.npz
numpy file and a scalers.pkl.z
containing pickled scalers.
Jupyter notebook (designed to be run on kaggle) for training a pytorch model.
Use previous pytorch model, download the latest users' data and predict their contributions number for the next 7 days.
Produce csv
files containing predictions.
There's an additional branch pred_history_with_scaling
containing predictions with a model trained to expect more contributions from users.