This repository contains code I use to train Keras ImageNet (ILSVRC2012) image classification models from scratch.
Highlight #1: I use TFRecords and tf.data.TFRecordDataset API to speed up data ingestion of the training pipeline. This way I could multi-process the data pre-processing (including online data augmentation) task, and keep the GPUs maximally utilized.
Highlight #2: In addition to data augmentation (random color distortion, rotation, flipping and cropping, etc.), I also use various tricks as an attempt to achieve best accuracy for the trained image classification models. More specifically, I implement "LookAhead" optimizer (reference), "iter_size" and "L2 regularization" for the Keras models, and have tried to use "AdamW" (Adam optimizer with decoupled weight decay).
Highlight #3: I also develop code/documentation about how to optimize the trained tf.keras models with TensorRT. Refer to README_tensorrt.md and Applying TensorRT on My tf.keras ImageNet Models for details.
I took most of the dataset preparation code from tensorflow models/research/inception. It was under Apache license as specified here.
Otherwise, please refer to the following blog posts for some more implementation details about the code:
- Training Keras Models with TFRecords and The tf.data API
- Displaying Images in TensorBoard
- Applying TensorRT on My tf.keras ImageNet Models
The dataset and CNN models in this repository are built and trained using the tf.keras
(tensorflow.keras
) API. I myself have tested the code with tensorflow 1.11.0 and 1.12.2. My implementation of the "LookAhead" optimizer and "iter_size" does not work for "tensorflow.python.keras.optimizer_v2.OptimizerV2" (tensorflow-1.13.0+). I would recommend tensorflow-1.12.x if you'd like to use those 2 features of my code.
In addition, the python code in this repository is for python3. Make sure you have tensorflow and its dependencies working for python3.
-
Download the "Training images (Task 1 & 2)" and "Validation images (all tasks)" from the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) download page.
$ ls -l ${HOME}/Downloads/ -rwxr-xr-x 1 jkjung jkjung 147897477120 Nov 7 2018 ILSVRC2012_img_train.tar -rwxr-xr-x 1 jkjung jkjung 6744924160 Nov 7 2018 ILSVRC2012_img_val.tar
-
Untar the "train" and "val" files. For example, I put the untarred files at ${HOME}/data/ILSVRC2012/.
$ mkdir -p ${HOME}/data/ILSVRC2012 $ cd ${HOME}/data/ILSVRC2012 $ mkdir train $ cd train $ tar xvf ${HOME}/Downloads/ILSVRC2012_img_train.tar $ find . -name "*.tar" | while read NAME ; do \ mkdir -p "${NAME%.tar}"; \ tar -xvf "${NAME}" -C "${NAME%.tar}"; \ rm -f "${NAME}"; \ done $ cd .. $ mkdir validation $ cd validation $ tar xvf ${HOME}/Downloads/ILSVRC2012_img_val.tar
-
Clone this repository.
$ cd ${HOME}/project $ git clone https://github.com/jkjung-avt/keras_imagenet.git $ cd keras_imagenet
-
Pre-process the validation image files. (The script would move the JPEG files into corresponding subfolders.)
$ cd data $ python3 ./preprocess_imagenet_validation_data.py \ ${HOME}/data/ILSVRC2012/validation \ imagenet_2012_validation_synset_labels.txt
-
Build TFRecord files for "train" and "validation". (This step could take a couple of hours, since there are 1,281,167 training images and 50,000 validation images in total.)
$ mkdir ${HOME}/data/ILSVRC2012/tfrecords $ python3 build_imagenet_data.py \ --output_directory ${HOME}/data/ILSVRC2012/tfrecords \ --train_directory ${HOME}/data/ILSVRC2012/train \ --validation_directory ${HOME}/data/ILSVRC2012/validation
-
As an example, train a "GoogLeNet_BN" (GoogLeNet with Batch Norms) model.
You could take a peek at train_new.sh and models/googlenet.py before executing the training. For example, you might adjust the learning rate schedule, weight decay and total training epochs in the script to see if it produces a model with better accuracy.
$ ./train_new.sh googlenet_bn
On my desktop PC with an NVIDIA GTX-1080 Ti GPU, it takes 7~8 days to train this model for 60 epochs. And top-1 accuracy of the trained googelnet_bn model is roughly 0.7091.
NOTE: I do random rotation of training images, which actually slows down data ingestion quite a bit. If you don't need random rotation as one of the data augmentation schemes, you could comment out the code to further speed up training.
For reference, here is a list of options for the
train.py
script which gets called insidetrain_new.sh
:--dataset_dir
: specify an alternative directory location for the TFRecords dataset.--dropout_rate
: add a DropOut layer before the last Dense layer, with the specified dropout rate. Default is no dropout.--weight_decay
: L2 regularization of weights in conv/dense layers.--optimizer
: "sgd", "adam" or "rmsprop". Default is "adam".--use_lookahead
: use "LookAhead" optimizer. Default is False.--batch_size
: batch size for both training and validation.--iter_size
: aggregate gradients before doing 1 weight update, i.e. effective_batch_size = batch_size * iter_size.--lr_sched
: "linear" or "exp" (exponential) decay of learning rates per epoch. Default is "linear".--initial_lr
: learning rate of the 1st epoch.--final_lr
: learning rate of the last epoch.--epochs
: total number of training epochs.
-
Evaluate accuracy of the trained googlenet_bn model.
$ python3 evaluate.py --dataset_dir ${HOME}/data/ILSVRC2012/tfrecords \ saves/googlenet_bn-model-final.h5
-
For training other CNN models, check out
train_new.sh
,train.py
andmodels/models.py
. This repository already supportsmobilenet_v2
,resnet50
,googlenet_bn
,inception_v2
,efficientnet_b0
,efficientnet_b1
,efficientnet_b4
andosnet
. You could implement your own Keras CNN models by extending the code inmodels/models.py
.
Model | Input | Size | Parameters | Top-1 Accuracy |
---|---|---|---|---|
googlenet_bn | 224x224 | 80.9MB | 7,020,392 | 0.7091 |
inception_v2 | 224x224 | 129.0MB | 11,214,888 | 0.7234 |
mobilenet_v2 | 224x224 | 40.8MB | 3,538,984 | 0.7054 |
resnet50 | 224x224 | -- | 25,636,712 | -- |
efficientnet_b0 | 224x224 | 41.1MB | 5,330,564 | 0.7318 |
osnet | 224x224 | 29.5MB | 2,440,952 | 0.6474 |
For some reason, Keras has trouble loading a trained/saved MobileNetV2 model. The load_model()
call would fail with this error message:
TypeError: '<' not supported between instances of 'dict' and 'float'
To work around this problem, I followed this post and added the following at line 309 (after the super()
call of ReLU
) lines in /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/advanced_activations.py
.
if type(max_value) is dict:
max_value = max_value['value']
if type(negative_slope) is dict:
negative_slope = negative_slope['value']
if type(threshold) is dict:
threshold = threshold['value']