We have an annotated bibliography in progress (Link to LaTeX document on Overleaf) as part of our literature review efforts.
Table of Contents
Several specialty libraries are used. A list of all packages and their version can be found in the config/envs directory.
Docstrings, typehints, and comments brought to you in Google Style.
Directory structure mimics conventional PyTorch projects. A full dated summary can be found in directory_tree.txt (use Get-ChildItem | tree /F > foo.txt
in PowerShell to create your own!)
You will need conda and Python version 3.6 or above.
Assuming you're in base dir of this project and are using a Linux based system: First you'll want to create a new conda (or pip) env with Python 3.7
conda create -n env_name python=3.7 anaconda
source activate env_name
Before cloning into this repository:
git clone https://github.com/flawnson/Generic_GNN.git
OR
pip install git+https://github.com/flawnson/Generic_GNN.git
Then you can run setup.py
python setup.py
Install depedencies in the requirements.txt.
pip install -r configs/envs/requirements_cpu.txt
Then you'll need to create an empty directory for model outputs (including saved models).
cd Generic_GNN && mkdir outputs
Finally you can run a demo version of the pipeline (default configs in configs directory).
python -c path/to/config/files/file.json -s path/to/schema/files/file.json
You can see the logged results using TensorBoard (to be setup soon).
tensorboard --logdir=logs/GAT_tuning/tune_model
Docker containers for running the project are on the roadmap!
The steps executed by each pipeline run are outlined:
- Load config dictionary from
JSON
file and setup logging, device management, set seeds, etc. - Data downloading and preprocessing
- Setup standard model (MLP, CNN, GCN, Transformer, etc.)
- Setup augmented model implemented as a standard model wrapper (Quine, HyperNetwork, etc.)
- Load model-dependant datasets (if required)
- Split datasets using selected strategy
- Select run-type pipeline
Run:
python main.py -c "path_to_config_file.json"
Configs are validated by a json schema to ensure only properly defined config files are run. There are a few configurations that are passed directly (unpacked) into function arguments and must therefore follow the function's signature. For example, the split_kwargs config must correspond with the scikitlearn.
Splitting is corresponded with SKLearn's model selection classes and methods as such:
"binary" == train_test_split()
"holdout" == LeavePOut()
"shuffle" == ShuffleSplit()
"kfold" == StratifiedKFold()
Naturally, splits that require shuffling data do not apply to time series data and our sequential models. The pipeline currently supports train and test splits only (you cannot specify a validation set)
There are 2 standard models implemented (Transformer and CNN in development):
- MLP
- GCN
There are 2 augmented models implemented:
- Quine (With help from: https://github.com/AustinT/nn-quine)
- HyperNetwork (With help from: https://github.com/g1910/HyperNetworks)
There are two datasets with loading and transformations implemented and used:
- MNIST - Quine and Classical model types
- CIFAR - HyperNetwork
Demo is a simple training run. It takes a configuration file and runs the Trainer once.
Tuning is a pair of consecutive runs. The first run executes the Tuner (a wrapper of the Trainer pipeline meant to find and return the best parameters it can find) once and the second run executes the Trainer once.
Parallelizing allows you to execute and run several Demo and/or Tuning pipelines in tandem. It uses mutliprocessing to find and use as many cores you define in the confiuration file (yet to be implemented).
Logging is controlled by the config files.
- Console logs - Runs are logged to console with logzero (mostly consists of info and exception logs) and saved as a
.txt
file in thesaves/logs
directory. - Config logs - A copy of the config is saved as a
.json
for each run in thesaves/logs
directory. - Tensorboard logs - Saved in the
runs
directory, used to visualize training.
Model checkpointing is performed with PyTorch's save
function.
- Model checkpoints are saved at each interval as specified in the
run_config
(saved in thesaves/checkpoints
directory). - The model file itself is copied into the checkpoint directory, where it can be used with the saved
.json
config (saved in thesaves/checkpoints
directory).
Currently we have a couple branches for making changes and contributions. New branches can be created if the process of achieving the desired outcome risks breaking the existing copy of the codebase on main.
The core contributors are reachable by email, Twitter, and most other means.
Thank you to Kevin for being a reliable partner, and a close friend.