This repository tries to introduce the different stages that someone needs to follow for classifying data based on their complexity. More information can be found in the interactive Google Colab Notebooks.
In Deep Learning it is required multiple times to distinguish data between them. The task that tackles this challenge is the classification task. In this repository we are going to present how we can classify the following datasets with deep learning models:
Datasets:
- Gaussian-blobs
- fashion items (fashionMNIST)
- relations among authors (ogb-collab)
Models:
- MultiLayer-Perceptron (MLP)
- Convolutional Neural Networks (CNNs)
- Graph Convolutional Networks (GCNs)
We can combine the aforementioned models to build different architectures which will help us to solve the classification task for each dataset.
The implementations of the different stages are based on the following frameworks:
- PyTorch
- PyTorch-Lightning
- Lightning-Bolts
- DGL (Deep Graph Library)
- OGB (Open Graph Benchmark)
In order to set up the necessary virtual environment:
- review and uncomment / comment based on GPU availability and CUDA version of your machine what you need
in
requirements.txt
and create a virtual environment.venv
:python3 -m venv .venv
- activate the new environment with:
source .venv/bin/activate # Mac & Linux users .venv\Scripts\activate # Windows users
- update
pip
package:python -m pip install --upgrade pip
- install
requirements.txt
:pip install -r requirements.txt
- PyTorch-installation based on GPU availability
- DGL-installation based on GPU availability
- install
zero_to_hero
package:pip install -e .
NOTE: The virtual environment will have
zero_to_hero
installed in editable mode. Some changes, e.g. insetup.cfg
, might require you to runpip install -e .
again.
Optional and needed only once after git clone https://github.com/Deligiorgis/zero_to_hero.git
:
- install several pre-commit git hooks with:
and checkout the configuration under
pre-commit install # You might also want to run `pre-commit autoupdate`
.pre-commit-config.yaml
. The-n, --no-verify
flag ofgit commit
can be used to deactivate pre-commit hooks temporarily.
├── AUTHORS.md <- List of developers and maintainers.
├── CHANGELOG.md <- Changelog to keep track of new features and fixes.
├── LICENSE.txt <- License as chosen on the command-line.
├── README.md <- The top-level README for developers.
├── configs <- Directory for configurations of model & application.
├── data <- The contents have been ignored
│ ├── FashionMNIST <- FashionMNIST data will be downloaded by default here.
│ ├── ogbl_collab <- OGBL-Collab data will be downloaded by default here.
├── docs <- Directory for Sphinx documentation in rst or md.
├── pyproject.toml <- Build system configuration. Do not change!
├── scripts <- Analysis and production scripts which import the
│ actual Python package, e.g. train_model.py.
├── setup.cfg <- Declarative configuration of your project.
├── setup.py <- Use `pip install -e .` to install for development or
| or create a distribution with `tox -e build`.
├── src
│ └── zero_to_hero <- Actual Python package where the main functionality goes.
├── tests <- Unit tests which can be run with `py.test`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
Assuming that you are using the aforementioned generated virtual environment.
To run the script that classifies the Gaussian Blobs first you'll need to choose how many dimensions you want for the
blobs and which architecture to use
(further information or examples can be found in
configs/blobs.yml
). Then run the following command:
python scripts/main_classify_blobs.py
To run the script that classifies the FashionMNIST dataset first you'll need to choose which architecture to use
(further information or examples can be found in
configs/fashion_mnist.yml
). Then run the following command:
python scripts/main_classify_fashion_mnist.py
To run the script that predicts the links (co-authors)
first you'll need to choose which architecture to use
(further information or examples can be found in
configs/collab.yml
). Then run the following command:
python scripts/main_link_prediction_collab.py
To monitor the progress of the model's training you can use TensorBoard by running the following command:
tensorboard --logdir=tensorboard_logs
Papers:
- Revisiting Graph Neural Networks for Link Prediction
- Link Prediction Based on Graph Neural Networks
- An End-to-End Deep Learning Architecture for Graph Classification
GitHub:
- https://github.com/facebookresearch/SEAL_OGB
- https://github.com/dmlc/dgl/tree/master/examples/pytorch/seal
- https://github.com/muhanzhang/DGCNN
- https://github.com/muhanzhang/pytorch_DGCNN
This project has been set up using PyScaffold 4.0.2 and the dsproject extension 0.6.1.