Zero To Hero Tutorial on a Deep Learning Classification Task

This repository tries to introduce the different stages that someone needs to follow for classifying data based on their complexity. More information can be found in the interactive Google Colab Notebooks.

Description

In Deep Learning it is required multiple times to distinguish data between them. The task that tackles this challenge is the classification task. In this repository we are going to present how we can classify the following datasets with deep learning models:

Datasets:

Gaussian-blobs
fashion items (fashionMNIST)
relations among authors (ogb-collab)

Models:

MultiLayer-Perceptron (MLP)
Convolutional Neural Networks (CNNs)
Graph Convolutional Networks (GCNs)

We can combine the aforementioned models to build different architectures which will help us to solve the classification task for each dataset.

The implementations of the different stages are based on the following frameworks:

PyTorch
PyTorch-Lightning
Lightning-Bolts
DGL (Deep Graph Library)
OGB (Open Graph Benchmark)

Installation

In order to set up the necessary virtual environment:

review and uncomment / comment based on GPU availability and CUDA version of your machine what you need in requirements.txt and create a virtual environment .venv:
```
python3 -m venv .venv
```

activate the new environment with:

source .venv/bin/activate  # Mac & Linux users
.venv\Scripts\activate  # Windows users

update pip package:
```
python -m pip install --upgrade pip
```
install requirements.txt:
```
pip install -r requirements.txt
```
PyTorch-installation based on GPU availability
DGL-installation based on GPU availability
install zero_to_hero package:
```
pip install -e .
```

NOTE: The virtual environment will have zero_to_hero installed in editable mode. Some changes, e.g. in setup.cfg, might require you to run pip install -e . again.

Optional and needed only once after git clone https://github.com/Deligiorgis/zero_to_hero.git:

install several pre-commit git hooks with:
```
pre-commit install
# You might also want to run `pre-commit autoupdate`
```
and checkout the configuration under .pre-commit-config.yaml. The -n, --no-verify flag of git commit can be used to deactivate pre-commit hooks temporarily.

Project Organization

├── AUTHORS.md              <- List of developers and maintainers.
├── CHANGELOG.md            <- Changelog to keep track of new features and fixes.
├── LICENSE.txt             <- License as chosen on the command-line.
├── README.md               <- The top-level README for developers.
├── configs                 <- Directory for configurations of model & application.
├── data                    <- The contents have been ignored
│   ├── FashionMNIST        <- FashionMNIST data will be downloaded by default here.
│   ├── ogbl_collab         <- OGBL-Collab data will be downloaded by default here.
├── docs                    <- Directory for Sphinx documentation in rst or md.
├── pyproject.toml          <- Build system configuration. Do not change!
├── scripts                 <- Analysis and production scripts which import the
│                              actual Python package, e.g. train_model.py.
├── setup.cfg               <- Declarative configuration of your project.
├── setup.py                <- Use `pip install -e .` to install for development or
|                              or create a distribution with `tox -e build`.
├── src
│   └── zero_to_hero        <- Actual Python package where the main functionality goes.
├── tests                   <- Unit tests which can be run with `py.test`.
├── .coveragerc             <- Configuration for coverage reports of unit tests.
├── .isort.cfg              <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.

How to run the classification task for:

Assuming that you are using the aforementioned generated virtual environment.

Gaussian Blobs

To run the script that classifies the Gaussian Blobs first you'll need to choose how many dimensions you want for the blobs and which architecture to use (further information or examples can be found in configs/blobs.yml). Then run the following command:

python scripts/main_classify_blobs.py

Fashion-MNIST

To run the script that classifies the FashionMNIST dataset first you'll need to choose which architecture to use (further information or examples can be found in configs/fashion_mnist.yml). Then run the following command:

python scripts/main_classify_fashion_mnist.py

OGBL-Collab

To run the script that predicts the links (co-authors) first you'll need to choose which architecture to use (further information or examples can be found in configs/collab.yml). Then run the following command:

python scripts/main_link_prediction_collab.py

Tensorboard

To monitor the progress of the model's training you can use TensorBoard by running the following command:

tensorboard --logdir=tensorboard_logs

References & Acknowledgments

Papers:

GitHub:

Note

This project has been set up using PyScaffold 4.0.2 and the dsproject extension 0.6.1.

pdubz-sudo/zero_to_hero