keras_imagenet

This repository contains code I use to train Keras ImageNet (ILSVRC2012) image classification models from scratch.

Highlight #1: I use TFRecords and tf.data.TFRecordDataset API to speed up data ingestion of the training pipeline. This way I could multi-process the data pre-processing (including online data augmentation) task, and keep the GPUs maximally utilized.

Highlight #2: In addition to data augmentation (random color distortion, rotation, flipping and cropping, etc.), I also use various tricks as an attempt to achieve best accuracy for the trained image classification models. More specifically, I implement "LookAhead" optimizer (reference), "iter_size" and "L2 regularization" for the Keras models, and have tried to use "AdamW" (Adam optimizer with decoupled weight decay).

Highlight #3: I also develop code/documentation about how to optimize the trained tf.keras models with TensorRT. Refer to README_tensorrt.md and Applying TensorRT on My tf.keras ImageNet Models for details.

I took most of the dataset preparation code from tensorflow models/research/inception. It was under Apache license as specified here.

Otherwise, please refer to the following blog posts for some more implementation details about the code:

Prerequisite

The dataset and CNN models in this repository are built and trained using the tf.keras (tensorflow.keras) API. I myself have tested the code with tensorflow 1.11.0 and 1.12.2. My implementation of the "LookAhead" optimizer and "iter_size" does not work for "tensorflow.python.keras.optimizer_v2.OptimizerV2" (tensorflow-1.13.0+). I would recommend tensorflow-1.12.x if you'd like to use those 2 features of my code.

In addition, the python code in this repository is for python3. Make sure you have tensorflow and its dependencies working for python3.

Step-by-step

Download the "Training images (Task 1 & 2)" and "Validation images (all tasks)" from the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) download page.

$ ls -l ${HOME}/Downloads/
-rwxr-xr-x 1 jkjung jkjung 147897477120 Nov  7  2018 ILSVRC2012_img_train.tar
-rwxr-xr-x 1 jkjung jkjung   6744924160 Nov  7  2018 ILSVRC2012_img_val.tar

Untar the "train" and "val" files. For example, I put the untarred files at ${HOME}/data/ILSVRC2012/.

$ mkdir -p ${HOME}/data/ILSVRC2012
$ cd ${HOME}/data/ILSVRC2012
$ mkdir train
$ cd train
$ tar xvf ${HOME}/Downloads/ILSVRC2012_img_train.tar
$ find . -name "*.tar" | while read NAME ; do \
      mkdir -p "${NAME%.tar}"; \
      tar -xvf "${NAME}" -C "${NAME%.tar}"; \
      rm -f "${NAME}"; \
  done
$ cd ..
$ mkdir validation
$ cd validation
$ tar xvf ${HOME}/Downloads/ILSVRC2012_img_val.tar

Clone this repository.

$ cd ${HOME}/project
$ git clone https://github.com/jkjung-avt/keras_imagenet.git
$ cd keras_imagenet

Pre-process the validation image files. (The script would move the JPEG files into corresponding subfolders.)

$ cd data
$ python3 ./preprocess_imagenet_validation_data.py \
          ${HOME}/data/ILSVRC2012/validation \
          imagenet_2012_validation_synset_labels.txt

Build TFRecord files for "train" and "validation". (This step could take a couple of hours, since there are 1,281,167 training images and 50,000 validation images in total.)

$ mkdir ${HOME}/data/ILSVRC2012/tfrecords
$ python3 build_imagenet_data.py \
          --output_directory ${HOME}/data/ILSVRC2012/tfrecords \
          --train_directory ${HOME}/data/ILSVRC2012/train \
          --validation_directory ${HOME}/data/ILSVRC2012/validation

As an example, train a "GoogLeNet_BN" (GoogLeNet with Batch Norms) model.

You could take a peek at train_new.sh and models/googlenet.py before executing the training. For example, you might adjust the learning rate schedule, weight decay and total training epochs in the script to see if it produces a model with better accuracy.
```
$ ./train_new.sh googlenet_bn
```
On my desktop PC with an NVIDIA GTX-1080 Ti GPU, it takes 7~8 days to train this model for 60 epochs. And top-1 accuracy of the trained googelnet_bn model is roughly 0.7091.

NOTE: I do random rotation of training images, which actually slows down data ingestion quite a bit. If you don't need random rotation as one of the data augmentation schemes, you could comment out the code to further speed up training.

For reference, here is a list of options for the train.py script which gets called inside train_new.sh:
- --dataset_dir: specify an alternative directory location for the TFRecords dataset.
- --dropout_rate: add a DropOut layer before the last Dense layer, with the specified dropout rate. Default is no dropout.
- --weight_decay: L2 regularization of weights in conv/dense layers.
- --optimizer: "sgd", "adam" or "rmsprop". Default is "adam".
- --use_lookahead: use "LookAhead" optimizer. Default is False.
- --batch_size: batch size for both training and validation.
- --iter_size: aggregate gradients before doing 1 weight update, i.e. effective_batch_size = batch_size * iter_size.
- --lr_sched: "linear" or "exp" (exponential) decay of learning rates per epoch. Default is "linear".
- --initial_lr: learning rate of the 1st epoch.
- --final_lr: learning rate of the last epoch.
- --epochs: total number of training epochs.

Evaluate accuracy of the trained googlenet_bn model.

$ python3 evaluate.py --dataset_dir ${HOME}/data/ILSVRC2012/tfrecords \
                      saves/googlenet_bn-model-final.h5

For training other CNN models, check out train_new.sh, train.py and models/models.py. This repository already supports mobilenet_v2, resnet50, googlenet_bn, inception_v2, efficientnet_b0, efficientnet_b1, efficientnet_b4 and osnet. You could implement your own Keras CNN models by extending the code in models/models.py.

Models trained with code in this repository

Model	Input	Size	Parameters	Top-1 Accuracy
googlenet_bn	224x224	80.9MB	7,020,392	0.7091
inception_v2	224x224	129.0MB	11,214,888	0.7234
mobilenet_v2	224x224	40.8MB	3,538,984	0.7054
resnet50	224x224	--	25,636,712	--
efficientnet_b0	224x224	41.1MB	5,330,564	0.7318
osnet	224x224	29.5MB	2,440,952	0.6474

Additional notes about MobileNetV2

For some reason, Keras has trouble loading a trained/saved MobileNetV2 model. The load_model() call would fail with this error message:

TypeError: '<' not supported between instances of 'dict' and 'float'

To work around this problem, I followed this post and added the following at line 309 (after the super() call of ReLU) lines in /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/advanced_activations.py.

    if type(max_value) is dict:
        max_value = max_value['value']
    if type(negative_slope) is dict:
        negative_slope = negative_slope['value']
    if type(threshold) is dict:
        threshold = threshold['value']