Traning and Transfer Learning ImageNet model in Pytorch

This project implements:

Training of popular model architectures, such as ResNet, AlexNet, and VGG on the ImageNet dataset;
Transfer learning from the most popular model architectures of above, fine tuning only the last fully connected layer.

Note:

ImageNet training will be documeted in the next release.
Transfer-learning was fully tested on alexnet, densenet121, inception_v3, resnet18 and vgg19. The other models will be tested in the next release.

Usage

usage: main.py [-h] [--data DIR] [--outf OUTF] [--evalf EVALF] [--arch ARCH]
               [-j N] [--epochs N] [--start-epoch N] [-b N] [--lr LR]
               [--momentum M] [--weight-decay W] [--print-freq N]
               [--resume PATH] [-e] [--train] [--test] [-t] [--pretrained]
               [--world-size WORLD_SIZE] [--dist-url DIST_URL]
               [--dist-backend DIST_BACKEND]

PyTorch ImageNet Training

optional arguments:
  -h, --help            show this help message and exit
  --data DIR            path to dataset
  --outf OUTF           folder to output model checkpoints
  --evalf EVALF         path to evaluate sample
  --arch ARCH, -a ARCH  model architecture: alexnet | densenet121 |
                        densenet161 | densenet169 | densenet201 | inception_v3
                        | resnet101 | resnet152 | resnet18 | resnet34 |
                        resnet50 | squeezenet1_0 | squeezenet1_1 | vgg11 |
                        vgg11_bn | vgg13 | vgg13_bn | vgg16 | vgg16_bn | vgg19
                        | vgg19_bn (default: resnet18)
  -j N, --workers N     number of data loading workers (default: 4)
  --epochs N            number of total epochs to run
  --start-epoch N       manual epoch number (useful on restarts)
  -b N, --batch-size N  mini-batch size (default: 256)
  --lr LR, --learning-rate LR
                        initial learning rate
  --momentum M          momentum
  --weight-decay W, --wd W
                        weight decay (default: 1e-4)
  --print-freq N, -p N  print frequency (default: 10)
  --resume PATH         path to latest checkpoint (default: none)
  -e, --evaluate        evaluate model on validation set
  --train               train the model
  --test                test a [pre]trained model on new images
  -t, --fine-tuning
                        transfer learning enabled + fine tuning - train only the last FC
                        layer.
  --pretrained          use pre-trained model
  --world-size WORLD_SIZE
                        number of distributed processes
  --dist-url DIST_URL   url used to set up distributed training
  --dist-backend DIST_BACKEND
                        distributed backend

ImageNet models Architecture

ImageNet training in PyTorch

Credit: karpathy.github.io

This project implements the ImageNet classification task on ImageNet dataset with different famous Convolutional Neural Network(CNN or ConvNet) models. This is a porting of pytorch/examples/imagenet making it usables on FloydHub.

Requirement

Download the ImageNet dataset and move validation images to labeled subfolders. Unfortunately at the moment the imagenet is not fully supported as torchvision.dataset, so we need to use the ImageFolder API which expects to load the dataset from a structure of this type:

ls /dataset

train
val
test

# Train
ls /dataset/train
cat
dog
tiger
plane
...

ls /dataset/train/cat
cat01.png
cat02.png
...

ls /dataset/train/dog
dog01.jpg
dog02.jpg
...
...[others classification folders]

# Val
ls /dataset/val
cat
dog
tiger
plane
...

ls /dataset/val/cat
cat01.png
cat02.png
...

ls /dataset/val/dog
dog01.jpg
dog02.jpg
...

# Test
ls /dataset/test
images

ls /dataset/test/images
test01.png
test02.png
...

Once you have build the dataset following the steps above, upload it as FloydHub dataset following this guide: create and upload FloydHub dataset.

Run on FloydHub

Here's the commands to train and evaluate your [pretrained] model on FloydHub(these section will be improved with the next release):

Project Setup

Before you start, log in on FloydHub with the floyd login command, then fork and init the project:

$ git clone https://github.com/floydhub/imagenet.git
$ cd imagenet
$ floyd init imagenet

Training

To train a model, run main.py with the desired model architecture and the path to the ImageNet dataset:

floyd run --gpu --data <your_user_name>/datasets/imagenet/<version>:input "python main.py -a resnet18 [other params]"

The default learning rate schedule starts at 0.1 and decays by a factor of 10 every 30 epochs. This is appropriate for ResNet and models with batch normalization, but too high for AlexNet and VGG. Use 0.01 as the initial learning rate for AlexNet or VGG:

floyd run --gpu --data <your_user_name>/datasets/imagenet/<version>:input "python main.py -a <arch> --lr 0.01 [other params]"

Note:

A full training on Imagenet can takes weeks according to the selected model.

Evaluating

It's time to evaluate our model with some images(put the images you want to classify in the test/images folder):

floyd run --gpu --env pytorch-0.2 --data <your_user_name>/datasets/imagenet/<version>:input --data <REPLACE_WITH_JOB_OUTPUT_NAME>:model "python main.py -a <arch> --test  --evalf test/ --resume /model/model_best.pth.tar"

#### Try Pytorch Pretrained model

Pytorch provided to you pretrained model for different models, if you want to evaluate your dataset with one of this model run:

```bash
floyd run --gpu --data <your_user_name>/datasets/<test_image>/<version>:input "python main.py -a [arch] --pretrained --data /input/test [other params]"

Serve model through REST API

FloydHub supports seving mode for demo and testing purpose. Before serving your model through REST API, you need to create a floyd_requirements.txt and declare the flask requirement in it. If you run a job with --mode serve flag, FloydHub will run the app.py file in your project and attach it to a dynamic service endpoint:

floyd run --gpu --mode serve --env pytorch-0.2  --data <your_user_name>/datasets/<test_image>/<version>:input --data <REPLACE_WITH_JOB_OUTPUT_NAME>:model

Note: The script retrieve the number of classes from the dataset --data <your_user_name>/datasets/<test_image>/<version>. This behavior will be fixed in the next release.

The above command will print out a service endpoint for this job in your terminal console.

The service endpoint will take a couple minutes to become ready. Once it's up, you can interact with the model by sending an image file with a POST request that the model will classify(according to ImageNet labels):

# Template
# curl -X POST -F "file=@<IMAGE>" <SERVICE_ENDPOINT>

# e.g. of a POST req
curl -X POST -F "file=@./test/images/test01.png"  https://www.floydlabs.com/expose/BhZCFAKom6Z8RptVKskHZW

Any job running in serving mode will stay up until it reaches maximum runtime. So once you are done testing, remember to shutdown the job!

Note that this feature is in preview mode and is not production ready yet

Transfer Learning

This project implements the a Transfer Learning classification task on the Bees Vs Ants toy dataset(train: 124 images of ants and 121 images of bees, val: 70 images of ants and 83 images of bees) with different Convolutional Neural Network(CNN or ConvNet) models. This is a porting of the transfer learning tutorial from the official PyTorch Docs making it usables on FloydHub.

Credit: Sasank Chilamkurthy who has written the amazing tutorial on transfer learning in the PyTorch docs.

Run on FloydHub

Here's the commands to train and evaluate your [pretrained] model on FloydHub(these section will be improved with the next release):

Project Setup

Before you start, log in on FloydHub with the floyd login command, then fork and init the project:

$ git clone https://github.com/floydhub/imagenet.git
$ cd imagenet
$ floyd init imagenet

Training

I have already uploaded it as FloydHub dataset so that you can try and familiarize with --data parameter which mounts the specified volume(datasets/model) inside the container of your FloydHub instance. Now it's time to run our training on FloydHub. In this example we will train the model for 10 epochs with a gpu instance. Note: If you want to mount/create a dataset look at the docs.

floyd run --gpu --env pytorch-0.2 --data redeipirati/datasets/pytorch-hymenoptera/1:input "python main.py -a resnet18 --train --fine-tuning --pretrained --epochs 10 -b 4"

Note:

--gpu run your job on a FloydHub GPU instance
--env pytorch-0.2 prepares a pytorch environment for python 3.
--data redeipirati/datasets/pytorch-hymenoptera/1 mounts the pytorch hymenoptera dataset(bees vs ants) in the /input folder inside the container for our job so that we do not need to dowload it at training time.

Evaluating

It's time to evaluate our model with some images:

floyd run --gpu --env pytorch-0.2 --data redeipirati/datasets/pytorch-hymenoptera/1:input --data <REPLACE_WITH_JOB_OUTPUT_NAME>:model "python main.py -a resnet18 --test --fine-tuning  --evalf test/ --resume /model/model_best.pth.tar"

Notes:

I've prepared for you some images in the test folder that you can use to evaluate your model. Feel free to add on it a bunch of bee/ant images downloaded from the web.
Remember to evaluate images which are taken from a similar distribution, otherwise you will have bad performance due to distribution mismatch.

Try our pre-trained model

We have provided to you a pre-trained model trained for 30 epochs with an accuracy of about 95%.

floyd run --gpu --env pytorch-0.2 --data redeipirati/datasets/pytorch-hymenoptera/1:input --data redeipirati/datasets/pytorch-hymenoptera-30-epochs-resnet18-model/1:model "python main.py -a resnet18 --test --fine-tuning  --evalf test/ --resume /model/model_best.pth.tar"

Serve model through REST API

floyd run --gpu --mode serve --env pytorch-0.2  --data redeipirati/datasets/pytorch-hymenoptera/1:input --data <REPLACE_WITH_JOB_OUTPUT_NAME>:model

Note: The script retrieve the number of classes from the dataset --data redeipirati/datasets/pytorch-hymenoptera/1. This behavior will be fixed in the next release.

The above command will print out a service endpoint for this job in your terminal console.

The service endpoint will take a couple minutes to become ready. Once it's up, you can interact with the model by sending an images(of ant or bee) file with a POST request that the model will classify:

# Template
# curl -X POST -F "file=@<ANT_OR_BEE>" <SERVICE_ENDPOINT>

# e.g. of a POST req
curl -X POST -F "file=@./test/images/test01.png"  https://www.floydlabs.com/expose/BhZCFAKom6Z8RptVKskHZW

Any job running in serving mode will stay up until it reaches maximum runtime. So once you are done testing, remember to shutdown the job!

Note that this feature is in preview mode and is not production ready yet

More resources

Some useful resources on ImageNet and the famous ConvNet models:

Contributing

For any questions, bug(even typos) and/or features requests do not hesitate to contact me or open an issue!

yang-jin-hai/imagenet

Traning and Transfer Learning ImageNet model in Pytorch

Usage

ImageNet models Architecture

Alexnet

Densenet

Inception_v3

Resnet34

Squeezenet1_0

Vgg19net

ImageNet training in PyTorch

Requirement

Run on FloydHub

Project Setup

Training

Evaluating

Serve model through REST API

Transfer Learning

Run on FloydHub

Project Setup

Training

Evaluating

Try our pre-trained model

Serve model through REST API

More resources

Contributing