
Insightface Keras implementation

Primary LanguagePython

Keras insightface

  • Keras Insightface implementation.
  • This is still under working, many things are still testing here, so there may lots of errors atm.
  • Any advise is welcome!
  • NOTE Seems adding TripletLoss train will improve evaluating accuracy on agedb_30 / lfw, but getting worse in real world test.
  • Environment
    # $ ipython
    Python 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20)
    In [1]: tf.__version__
    Out[1]: '2.2.0'
    Or tf-nightly
    In [1]: tf.__version__
    Out[1]: '2.3.0-dev20200523'
  • Default import
    import os
    import sys
    import pandas as pd
    import numpy as np
    import tensorflow as tf
    from tensorflow import keras
    gpus = tf.config.experimental.list_physical_devices("GPU")
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)


Current accuracy

  • Rerunning all with label smoothing.
Model backbone lfw cfp_fp agedb_30 Epochs
Mobilefacenet 0.994167 0.944143 0.942500 50
ResNet101V2 0.997333 0.976714 0.971000 110
EfficientNetB4 0.997167 0.967000 0.962500 54
se_mobilefacenet 0.996333 0.964714 0.958833 100
ResNeSt101 0.997667 0.981000 0.973333 100


Beforehand Data Prepare

  • Training Data in this project is MS1M-ArcFace downloaded from Insightface Dataset Zoo
  • Evaluating data is LFW CFP-FP AgeDB-30 bin files included in MS1M-ArcFace dataset
  • Any other data is also available just in the right format
  • prepare_data.py script, Extract data from mxnet record format to folders.
    # Convert `/datasets/faces_emore` to `/datasets/faces_emore_112x112_folders`
    CUDA_VISIBLE_DEVICES='-1' ./prepare_data.py -D /datasets/faces_emore
    # Convert evaluating bin files
    CUDA_VISIBLE_DEVICES='-1' ./prepare_data.py -D /datasets/faces_emore -T lfw.bin cfp_fp.bin agedb_30.bin
    Executing again will skip dataset conversion.
  • Training dataset Required is a folder including person folders, each person folder including multi face images. Format like
    .               # dataset folder
    ├── 0           # person folder
    │   ├── 100.jpg # face image
    │   ├── 101.jpg # face image
    │   └── 102.jpg # face image
    ├── 1           # person folder
    │   ├── 111.jpg
    │   ├── 112.jpg
    │   └── 113.jpg
    ├── 10
    │   ├── 707.jpg
    │   ├── 708.jpg
    │   └── 709.jpg
  • Evaluting bin files include jpeg image data pairs, and a label indicating if it's a same person, so there are double images than labels
    #    bins   | issame_list
    img_1 img_2 | 1
    img_3 img_4 | 1
    img_5 img_6 | 0
    img_7 img_8 | 0
    Image data in bin files like CFP-FP AgeDB-30 is not compatible with tf.image.decode_jpeg, we need to reformat it.
    ''' Throw error if not reformated yet '''
    ValueError: Can't convert non-rectangular Python sequence to Tensor.

Training scripts

  • Scripts
    • data.py loads image data as tf.dataset for training. Triplet dataset is different from others.
    • data_gen.py NOT working, accuracy wont increase. Using ImageDataGenerator and AutoAugment to load images.
    • evals.py contains evaluating callback using bin files.
    • losses.py contains softmax / arcface / centerloss / triplet loss functions.
    • backbones basic model implementation of mobilefacenet / mobilenetv3 / resnest. Other backbones like ResNet101V2 is loaded from keras.applications in train.buildin_models.
    • myCallbacks.py contains my other callbacks, like saving model / learning rate adjusting / save history.
    • plot.py contains a history plot function.
    • train.py contains a Train class. It uses a scheduler to connect different loss / optimizer / epochs. The basic function is simple load data --> model --> compile --> fit.
  • Model contains two part
    • Basic model is layers from input to embedding.
    • Model is Basic model + bottleneck layer, like softmax / arcface layer. For triplet training, Model == Basic model.
  • Training example
    from tensorflow import keras
    from backbones import mobile_facenet
    import losses
    import train
    # basic_model = train.buildin_models("MobileNet", dropout=0.4, emb_shape=256)
    # basic_model = train.buildin_models("ResNet101V2", dropout=0.4, emb_shape=512)
    # basic_model = train.buildin_models("ResNest101", dropout=0.4, emb_shape=512)
    # basic_model = train.buildin_models('EfficientNetB0', dropout=0.4, emb_shape=256)
    # basic_model = train.buildin_models('EfficientNetB4', dropout=0.4, emb_shape=256)
    # basic_model = mobile_facenet.mobile_facenet(256, dropout=0.4, name="se_mobile_facenet_256", use_se=True)
    basic_model = mobile_facenet.mobile_facenet(256, dropout=0.4, name="mobile_facenet_256")
    data_path = '/datasets/faces_emore_112x112_folders'
    eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']
    tt = train.Train(data_path, save_path='keras_mobile_facenet_emore.h5', eval_paths=eval_paths, basic_model=basic_model, lr_base=0.001, batch_size=640, random_status=3)
    # tt = train.Train(data_path, save_path='keras_mobile_facenet_emore.h5', eval_paths=eval_paths, basic_model=basic_model, lr_base=0.001, lr_decay=105, lr_min=1e-7, batch_size=640, random_status=3)
    sch = [
      {"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "optimizer": "nadam", "epoch": 25},
      # {"loss": losses.margin_softmax, "epoch": 10},
      {"loss": losses.ArcfaceLoss(), "bottleneckOnly": True, "centerloss": True, "epoch": 4},
      {"loss": losses.ArcfaceLoss(), "centerloss": True, "epoch": 35},
      {"loss": losses.batch_hard_triplet_loss, "optimizer": "nadam", "epoch": 30},
    tt.train(sch, 0)
    train.Train is mostly functioned as a scheduler, the basic strategy is simple
    from tensorflow import keras
    import losses, data, evals, myCallbacks
    from backbones import mobile_facenet
    # Dataset
    data_path = '/datasets/faces_emore_112x112_folders'
    train_ds = data.prepare_dataset(data_path, batch_size=512, random_status=3, random_crop=(100, 100, 3))
    classes = train_ds.element_spec[-1].shape[-1]
    # Model
    basic_model = mobile_facenet.mobile_facenet(256, dropout=0.4, name="mobile_facenet_256")
    model_output = keras.layers.Dense(classes, activation="softmax")(basic_model.outputs[0])
    model = keras.models.Model(basic_model.inputs[0], model_output)
    # Evals and basic callbacks
    eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']
    my_evals = [evals.eval_callback(basic_model, ii, batch_size=512, eval_freq=1) for ii in eval_paths]
    my_evals[-1].save_model = 'keras_mobilefacenet'
    basic_callbacks = myCallbacks.basic_callbacks(checkpoint='keras_mobilefacenet.h5', evals=my_evals, lr=0.001)
    callbacks = my_evals + basic_callbacks
    # Compile and fit
    model.compile(optimizer='nadam', loss=losses.arcface_loss, metrics=["accuracy"])
    model.fit(train_ds, epochs=15, callbacks=callbacks, verbose=1)
  • train.Train basic_model and model parameters. Combine these two parameters to initializing model from different sources. Sometimes may need custom_objects to load model.
    basic_model model Used for
    model structure None Scratch train
    model layer index for basic model output model .h5 file Continue training from last saved model
    basic model .h5 file None Continue training from a saved basic model
    model layer index for basic model output model structure Continue training from a modified model
  • Scheduler is a list of dicts, each contains a training plan
    • loss indicates the loss function.
    • optimizer is the optimizer used in this plan, None indicates using the last one.
    • epoch indicates how many epochs will be trained.
    • bottleneckOnly True / False, True will set basic_model.trainable = False, train the bottleneck layer only.
    • centerloss True / False, if set True, loss is an instance of CenterLoss, or the logits_loss added to center_loss.
    • type softmax / arcface / triplet, but mostly this could be guessed from loss.
    # Scheduler examples
    sch = [{"loss": keras.losses.categorical_crossentropy, "optimizer": "adam", "epoch": 1}]
    sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "epoch": 2}]
    sch = [{"loss": losses.ArcfaceLoss(scale=32.0, label_smoothing=0.1), "optimizer": keras.optimizers.SGD(0.001, momentum=0.9), "epoch": 1}]
    sch = [{"loss": losses.ArcfaceLoss(), "optimizer": None, "bottleneckOnly": True, "epoch": 1}]
    sch = [{"loss": losses.arcface_loss, "optimizer": "adam", "centerloss": True, "epoch": 1}]
    sch = [{"loss": losses.ArcfaceLoss(), "bottleneckOnly": True, "centerloss": True, "epoch": 1}]
    sch = [{"loss": losses.CenterLoss(num_classes=85742), "centerloss": True, "epoch": 1}]
    sch = [{"loss": losses.batch_hard_triplet_loss, "optimizer": "adam", "epoch": 1}]
    sch = [{"loss": losses.BatchHardTripletLoss(0.3), "epoch": 1}]
    Some more complicated combinations are also supported, but it may lead to nowhere...
    # `softmax` / `arcface` + `triplet`
    sch = [{"loss": losses.BatchHardTripletLoss(0.3, logits_loss=keras.losses.CategoricalCrossentropy(label_smoothing=0.1)), "epoch": 5}]
    # `triplet` + `centerloss`
    sch = [{"loss": losses.BatchHardTripletLoss(0.35), "centerloss": True, "epoch": 5}]
    # `softmax` / `arcface` + `triplet` + `centerloss`
    sch = [{"loss": losses.BatchHardTripletLoss(0.3, logits_loss=losses.ArcfaceLoss()), "centerloss": True, "epoch": 5}]
  • Saving strategy
    • Model will save the latest one on every epoch end to local path ./checkpoints, name is specified by train.Train save_path.
    • basic_model will be saved monitoring on the last eval_paths evaluating bin item, and save the best only.
    ''' Continue training from last saved file '''
    from tensorflow import keras
    import losses
    import train
    data_path = '/datasets/faces_emore_112x112_folders'
    eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']
    tt = train.Train(data_path, 'keras_mobilefacenet_256_II.h5', eval_paths, model='./checkpoints/keras_mobilefacenet_256.h5', compile=True, lr_base=0.001, batch_size=768, random_status=3)
    sch = [
      # {"loss": keras.losses.categorical_crossentropy, "optimizer": "nadam", "epoch": 15},
      {"loss": losses.margin_softmax, "epoch": 6},
      {"loss": losses.ArcfaceLoss(), "bottleneckOnly": True, "epoch": 4},
      {"loss": losses.ArcfaceLoss(), "epoch": 35},
      {"loss": losses.batch_hard_triplet_loss, "optimizer": "nadam", "epoch": 30},
    tt.train(sch, 19) # 19 is the initial_epoch
  • Gently stop is a callback to stop training gently. Input an n and <Enter> anytime during training, will set training stop on that epoch ends.
  • My history
    • This is a callback collecting training loss, accuracy and evaluating accuracy.
    • On every epoch end, backup to the path save_path defined in train.Train with suffix _hist.json.
    • Reload when initializing, if the backup <save_path>_hist.json file exists.
  • Learning rate
    • Exponential decay, default one, lr_base and lr_decay in train.Train set it. Default is lr_base=0.001, lr_decay=0.05.
    • Cosine decay with restart
      • Set lr_decay with a value > 1 will use cosine lr decay, in this case lr_decay means total decay steps.
      • Set lr_on_batch with a value > 1 will set decay on every NUM batches, default lr_on_batch=0 means decay on every epoch.
      • Other default values restarts=3, t_mul=2.0, m_mul=0.5 are set in myCallbacks.py. See keras.experimental.CosineDecayRestarts for detail.
    import myCallbacks
    epochs = np.arange(100)
    plt.plot(epochs, [myCallbacks.scheduler(ii, 0.001, 0.1) for ii in epochs], label="lr=0.001, decay=0.1")
    plt.plot(epochs, [myCallbacks.scheduler(ii, 0.001, 0.05) for ii in epochs], label="lr=0.001, decay=0.05")
    plt.plot(epochs, [myCallbacks.scheduler(ii, 0.001, 0.02) for ii in epochs], label="lr=0.001, decay=0.02")
    aa = myCallbacks.CosineLrScheduler(0.001, 100, 1e-6, 0, restarts=1)
    plt.plot(epochs, [aa.on_epoch_begin(ii) for ii in epochs], label="Cosine, lr=0.001, decay_steps=100, min=1e-6")
    aa = myCallbacks.CosineLrScheduler(0.001, 105, 1e-7, 0, restarts=3)
    plt.plot(epochs, [aa.on_epoch_begin(ii) for ii in epochs], label="Cosine restart, lr=0.001, decay_steps=105, min=1e-7, restarts=3")
    bb = myCallbacks.CosineLrScheduler(0.001, 105 * 1000, lr_min=1e-7, warmup_iters=5 * 1000, lr_on_batch=1000, restarts=4)
    plt.plot([bb.on_train_batch_begin(ii * 1000) for ii in range(100)], label="Cosine restart, lr=0.001, decay_steps=105000, on batch, min=1e-7, warmup=5000, restarts=4")
  • Evaluation
    import evals
    basic_model = keras.models.load_model('checkpoints/keras_mobilefacenet_256_basic_agedb_30_epoch_39_0.942500.h5', compile=False)
    ee = evals.eval_callback(basic_model, '/datasets/faces_emore/lfw.bin')
    # >>>> lfw evaluation max accuracy: 0.993167, thresh: 0.316535, previous max accuracy: 0.000000, PCA accuray = 0.993167 ± 0.003905
    # >>>> Improved = 0.993167
    Default evaluating strategy is on_epoch_end. Setting an eval_freq greater than 1 in train.Train will also add an on_batch_end evaluation.
    # Change evaluating strategy to `on_epoch_end`, as long as `on_batch_end` for every `1000` batch.
    tt = train.Train(data_path, 'keras_mobilefacenet_256.h5', eval_paths, basic_model=basic_model, eval_freq=1000)
  • EfficientNet tf-nightly now includes all EfficientNet backbone in tensorflow.keras.applications, but it has a Rescaling and Normalization layer on the head.
    # '2.3.0-dev20200523'
    mm = tf.keras.applications.efficientnet.EfficientNetB4(include_top=False, weights='imagenet', input_shape=(112, 112, 3))
    [ii.name for ii in mm.layers[:6]]
    # ['input_17', 'rescaling_2', 'normalization_2', 'stem_conv_pad', 'stem_conv', 'stem_bn']
    So I'm using another implementation here Github qubvel/EfficientNet
    !pip install -U git+https://github.com/qubvel/efficientnet
    import efficientnet.tfkeras as efntf
    mm = efntf.EfficientNetB0(weights='imagenet', include_top=False, input_shape=(112, 112, 3))
    [ii.name for ii in mm.layers[:3]]
    # ['input_18', 'stem_conv', 'stem_bn']
  • ResNeSt / RegNet Github QiaoranC/tf_ResNeSt_RegNet_model
    from models.model_factory import get_model
    input_shape = [112, 112, 3]
    n_classes = 100
    fc_activation = 'softmax'
    mm = get_model(model_name="ResNest101",input_shape=input_shape,n_classes=n_classes, verbose=False,fc_activation=fc_activation)
  • SE nets
    # This should be under tf 2.2, NOT tf nightly
    # '2.2.0'
    !pip install -U git+https://github.com/titu1994/keras-squeeze-excite-network
    from keras_squeeze_excite_network import se_resnext
    mm = se_resnext.SEResNextImageNet(weights='imagenet', input_shape=(112, 112, 3), include_top=False)
    It's TOO slow training a se_resnext 101,takes almost 4 times longer than ResNet101V2.

Multi GPU train

  • For multi GPU train, should better use tf-nightly
    conda create -n tf-nightly
    conda activate tf-nightly
    pip install tf-nightly glob2 pandas tqdm scikit-image scikit-learn ipython
    # Install cuda 10.1 if not installed
    conda install cudnn=7.6.5=cuda10.1_0
  • Add an overall tf.distribute.MirroredStrategy().scope() with block. This is just working in my case... The batch_size will be multiplied by GPU numbers.
    # 2.3.0-dev20200523
    with tf.distribute.MirroredStrategy().scope():
        basic_model = ...
        tt = train.Train(..., batch_size=1024, ...) # With 2 GPUs, `batch_size` will be 2048
        sch = [...]
        tt.train(sch, 0)

Training Record

Loss function test on Mobilenet

  • This tests loss functions on Mobilenet for their efficiency, but only one epoch training may not be very valuable.

  • Initialize training from scratch for 6 epochs

    from tensorflow import keras
    import losses
    import train
    basic_model = train.buildin_models("MobileNet", dropout=0.4, emb_shape=256)
    data_path = '/datasets/faces_emore_112x112_folders'
    eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']
    tt = train.Train(data_path, 'keras_mobilenet_256.h5', eval_paths, basic_model=basic_model, model=None, compile=False, lr_base=0.001, batch_size=128, random_status=3)
    sch = [{"loss": losses.ArcfaceLoss(), "optimizer": None, "epoch": 6}]
    tt.train(sch, 0)
  • Train next epoch 7 using different loss functions

    ''' Load saved basic model '''
    import losses
    import train
    data_path = '/datasets/faces_emore_112x112_folders'
    eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']
    tt = train.Train(data_path, 'keras_mobilenet_256_V.h5', eval_paths, basic_model="./checkpoints/keras_mobilenet_256_basic_agedb_30_epoch_6_0.900333.h5", model=None, compile=False, lr_base=0.001, batch_size=128, random_status=3)
    ''' Choose one loss function each time --> train one epoch --> reload'''
    sch = [{"loss": keras.losses.categorical_crossentropy, "optimizer": "adam", "epoch": 1}]
    sch = [{"loss": losses.margin_softmax, "optimizer": "adam", "epoch": 1}]
    sch = [{"loss": losses.scale_softmax, "optimizer": "adam", "epoch": 1}]
    sch = [{"loss": losses.arcface_loss, "optimizer": "adam", "epoch": 1}]
    sch = [{"loss": losses.arcface_loss, "optimizer": "adam", "centerloss": True, "epoch": 1}]
    sch = [{"loss": losses.batch_hard_triplet_loss, "optimizer": "adam", "epoch": 1}]
    sch = [{"loss": losses.batch_all_triplet_loss, "optimizer": "adam", "epoch": 1}]
    !pip install -q --no-deps tensorflow-addons
    import tensorflow_addons as tfa
    sch = [{"loss": tfa.losses.TripletSemiHardLoss(), "optimizer": "adam", "epoch": 1, "type": tt.triplet}]
    ''' Train '''
    tt.train(sch, 6)
  • Loss and accuracy

    Loss type loss accuracy lfw lfw thresh cfp_fp cfp_fp thresh agedb_30 agedb_30 thresh total time per step
    Original Epoch 6 22.6342 0.7855 0.987833 0.307455 0.891714 0.201755 0.900333 0.229057 5653s 124ms
    Train Epoch 7
    softmax 1.8196 0.6941 0.987333 0.345970 0.895286 0.204387 0.901667 0.265905 5677s 125ms
    margin_softmax 3.8359 0.6294 0.989000 0.317540 0.889000 0.210142 0.897833 0.246658 5716s 126ms
    scale_softmax 2.2430 0.6779 0.987333 0.340417 0.887857 0.204122 0.900333 0.273266 5702s 125ms
    arcface_loss 22.3337 0.7928 0.987500 0.293580 0.886857 0.199602 0.904833 0.247436 6133s 135ms
    center arcface_loss 22.5102 0.7924 0.987833 0.321488 0.884000 0.200262 0.894833 0.263254 5861s 129ms
    batch_hard_triplet_loss 0.2276 0.986333 0.386425 0.910571 0.245836 0.891333 0.354833 4622s 156ms
    batch_all_triplet_loss 0.4749 0.984333 0.417722 0.902571 0.240187 0.837167 0.475637 4708s 159ms
    TripletSemiHardLoss 0.0047 0.957500 0.520159 0.837857 0.441421 0.778833 0.626684 4400s 148ms


  • Training script is the last exampled one.

  • Mobilefacenet Record Two models are trained, with batch_size=160 and batch_size=768 respectively.

    Loss Epochs First epoch (batch_size=768)
    Softmax 15 12744s 2s/step - loss: 4.8241 - accuracy: 0.3282
    Margin Softmax 10 13041s 2s/step - loss: 0.4096 - accuracy: 0.9323
    Bottleneck Arcface 4 4292s 566ms/step - loss: 21.6166 - accuracy: 0.8569
    Arcface 64 35 12793s 2s/step - loss: 15.4268 - accuracy: 0.9441
  • se_mobilefacenet Record Two models are trained, with label_smoothing=0 and label_smoothing=0.1 respectively, batch_size = 640

    Loss Epochs First epoch (label_smoothing=0.1) First epoch (label_smoothing=0)
    Softmax 15 13256s 2s/step - loss: 5.9982 - accuracy: 0.3615
    Bottleneck Arcface 4 4111s 452ms/step - loss: 21.7145 - accuracy: 0.8624 4104s 451ms/step - loss: 20.7879 - accuracy: 0.8643
    Arcface 64 30 13043s 1s/step - loss: 16.7003 - accuracy: 0.9491 13092s 1s/step - loss: 15.0788 - accuracy: 0.9498
    Triplet (BS 1440) 50 6688s 2s/step - loss: 0.2319
  • Plot

    import plot
    # plot.hist_plot_split("./checkpoints/keras_mobile_facenet_emore_hist.json", [15, 10, 4, 35], ["Softmax", "Margin Softmax", "Bottleneck Arcface", "Arcface scale=64"])
    customs = ["agedb_30", "cfp_fp"]
    epochs = [15, 10, 4, 35]
    _, axes = plt.subplots(1, 3, figsize=(24, 8))
    axes, _ = plot.hist_plot_split("checkpoints/keras_mobile_facenet_emore_hist.json", epochs, customs=customs, axes=axes, fig_label="Mobilefacenet, BS=768")
    axes, _ = plot.hist_plot_split("checkpoints/keras_mobilefacenet_256_hist_all.json", epochs, customs=customs, axes=axes, fig_label="Mobilefacenet, BS=160")
    axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_VI_hist.json', epochs, customs=customs, axes=axes, fig_label="se, Cosine, BS = 640, LS=0.1")
    axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_VII_nadam_hist.json', epochs, customs=customs, axes=axes, fig_label="se, Cosine, BS = 640, nadam, LS=0.1", init_epoch=3)
    axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_VIII_hist.json', epochs, customs=customs, axes=axes, fig_label="new se_mobilefacenet, Cosine, center, BS = 640, nadam, LS=0.1")
    axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_VIII_PR_hist.json', epochs, customs=customs, axes=axes, fig_label="new se_mobilefacenet, PR, Cosine, center, BS = 640, nadam, LS=0.1")
    axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_X_hist.json', epochs, customs=customs, axes=axes, fig_label="new se_mobilefacenet, Cosine, center, leaky, BS = 640, nadam, LS=0.1")
    axes, pre_1 = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_hist.json', epochs, names=["Softmax", "Margin Softmax"], customs=customs, axes=axes, fig_label="se, BS = 640, LS=0.1")
    axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_II_hist.json', [4, 35], customs=customs, init_epoch=25, pre_item=pre_1, axes=axes, fig_label="se, BS = 640, LS=0.1")
    axes, pre_2 = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_III_hist_E45.json', [4, 35], names=["Bottleneck Arcface", "Arcface scale=64"], customs=customs, init_epoch=25, pre_item=pre_1, axes=axes, fig_label="se, BS = 640, LS=0")
    axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_triplet_III_hist.json', [10, 10, 10, 20], names=["Triplet alpha=0.35", "Triplet alpha=0.3", "Triplet alpha=0.25", "Triplet alpha=0.2"], customs=customs, init_epoch=59, pre_item=pre_2, axes=axes, save="", fig_label="se, BS = 640, triplet")

Loss function test on Mobilefacenet epoch 44

  • For Epoch 44, trained steps are 15 epochs softmax + 10 epochs margin softmax + 4 epochs arcface bottleneck only + 15 epochs arcface
  • Run a batch of optimizer + loss test. Each test run is 10 epochs.
    # This `train.Train` is the `batch_size = 160` one.
    sch = [{"loss": losses.ArcfaceLoss(), "epoch": 10}]  # Same as previous epochs
    sch = [{"loss": losses.Arcface(scale=32.0), "epoch": 10}] # fix lr == 1e-5
    sch = [{"loss": losses.Arcface(scale=32.0), "epoch": 10}] # lr decay, decay_rate = 0.1
    sch = [{"loss": losses.ArcfaceLoss(), "optimizer": keras.optimizers.SGD(0.001, momentum=0.9), "epoch": 10}]
    tt.train(sch, 40) # sub bottleneck only epochs
    From Epoch 54, Pick the best one Scale=64.0, lr decay, optimizer=nadam, run optimizer nadam / adam testing
    sch = [{"loss": losses.ArcfaceLoss(), "epoch": 10}]
    sch = [{"loss": losses.ArcfaceLoss(), "optimizer": "adam", "epoch": 10}]
    tt.train(sch, 50) # sub bottleneck only epochs
  • Result
    import plot
    axes, _ = plot.hist_plot_split('./checkpoints/keras_mobilefacenet_256_II_hist.json', [10], customs=["lr"], init_epoch=40, axes=None, fig_label="S=32, lr=5e-5, nadam")
    axes, _ = plot.hist_plot_split('./checkpoints/keras_mobilefacenet_256_III_hist.json', [10], customs=["lr"], init_epoch=40, axes=axes, save="", fig_label="S=32, lr decay, nadam")
    import plot
    axes, _ = plot.hist_plot_split('./checkpoints/keras_mobilefacenet_256_IV_hist.json', [10], customs=["lr"], init_epoch=40, axes=None, fig_label="S=64, lr decay, SGD")
    axes, pre_1 = plot.hist_plot_split('./checkpoints/keras_mobilefacenet_256_VI_hist.json', [10], customs=["lr"], init_epoch=40, axes=axes, fig_label="S=64, lr decay, nadam")
    axes, _ = plot.hist_plot_split('./checkpoints/keras_mobilefacenet_256_VII_hist.json', [10], customs=["lr"], init_epoch=50, pre_item=pre_1, axes=axes, fig_label="S=64, lr decay, nadam")
    axes, _ = plot.hist_plot_split('./checkpoints/keras_mobilefacenet_256_VIII_hist.json', [10], customs=["lr"], init_epoch=50, pre_item=pre_1, axes=axes, save="", fig_label="S=64, lr decay, adam")


  • Training script is similar with Mobilefacenet, just replace basic_model with ResNet101V2, and set a new save_path

    basic_model = train.buildin_models("ResNet101V2", dropout=0.4, emb_shape=512)
    tt = train.Train(data_path, 'keras_resnet101_512.h5', eval_paths, basic_model=basic_model, batch_size=1024)
  • Record Two models are trained, with batch_size=1024 and batch_size=896, label_smoothing=0.1 respectively.

    Loss epochs First epoch (batch_size=896) First epoch (2 GPUs, batch_size=1792)
    Softmax 25 11272s 2s/step - loss: 4.6730 - accuracy: 0.5484
    Bottleneck Arcface 4 4053s 624ms/step - loss: 16.5645 - accuracy: 0.9414
    Arcface 64 35 11181s 2s/step - loss: 10.8983 - accuracy: 0.9870 6419s 2s/step - loss: 5.8991 - accuracy: 0.9896
    Triplet 30 5758s 2s/step - loss: 0.1562
  • Plot

    """ Evaluating accuracy is not improving from my end point """
    import plot
    # epochs = [15, 10, 4, 65, 15, 5, 5, 15]
    # history = ['./checkpoints/keras_resnet101_emore_hist.json', './checkpoints/keras_resnet101_emore_basic_hist.json']
    # plot.hist_plot_split("./checkpoints/keras_resnet101_emore_hist.json", [15, 10, 4, 35], ["Softmax", "Margin Softmax", "Bottleneck Arcface", "Arcface scale=64"])
    # axes, _ = plot.hist_plot_split(history, epochs, names=["Softmax", "Margin Softmax", "Bottleneck Arcface", "Arcface scale=64", "Triplet alpha=0.35", "Triplet alpha=0.3", "Triplet alpha=0.25", "Triplet alpha=0.2"], customs=customs, axes=axes, save="", fig_label='Resnet101, BS=896, label_smoothing=0.1')
    # axes, _ = plot.hist_plot_split(history, epochs, customs=customs, fig_label="ResNet101V2, BS=1024")
    customs = ["lfw", "agedb_30", "cfp_fp"]
    history = ['./checkpoints/keras_resnet101_emore_II_hist.json', './checkpoints/keras_resnet101_emore_II_triplet_hist.json']
    epochs = [25, 4, 35, 10, 10, 10, 10, 10]
    axes, _ = plot.hist_plot_split(history, epochs, names=["Softmax", "Bottleneck Arcface", "Arcface scale=64", "Triplet alpha=0.35", "Triplet alpha=0.3", "Triplet alpha=0.25", "Triplet alpha=0.2", "Triplet alpha=0.15"], customs=customs, save="", fig_label='Resnet101, BS=896, label_smoothing=0.1')


  • Training script

    with tf.distribute.MirroredStrategy().scope():
        basic_model = train.buildin_models('EfficientNetB4', 0.4, 512)
        tt = train.Train(data_path, 'keras_EB4_emore.h5', eval_paths, basic_model=basic_model, batch_size=420, random_status=3)
  • Record

    Loss epochs First epoch (batch_size=420) First epoch (2 GPUs, batch_size=840)
    Softmax 25 17404s 1s/step - loss: 4.4620 - accuracy: 0.5669
    Bottleneck Arcface 4 4364s 629ms/step - loss: 18.1350 - accuracy: 0.9166
    Arcface 64 35 11047s 2s/step - loss: 11.3806 - accuracy: 0.9781
    Triplet 30
  • Plot

    """ Comparing EfficientNetB4 and ResNet101 """
    import plot
    customs = ["lfw", "agedb_30", "cfp_fp"]
    epochs = [15, 10, 4, 30]
    axes, _ = plot.hist_plot_split("checkpoints/keras_resnet101_emore_II_hist.json", epochs, customs=customs, axes=None, fig_label='Resnet101, BS=1024, label_smoothing=0.1')
    axes, _ = plot.hist_plot_split("checkpoints/keras_EB4_emore_hist.json", epochs, names=["Softmax", "Margin Softmax", "Bottleneck Arcface", "Arcface scale=64", "Triplet"], customs=customs, axes=axes, save="", fig_label='EB4, BS=840, label_smoothing=0.1')


  • Training script is similar with Mobilefacenet, just replace basic_model with ResNest101, and set a new save_path

    basic_model = train.buildin_models("ResNeSt101", dropout=0.4, emb_shape=512)
    tt = train.Train(data_path, 'keras_ResNest101_emore.h5', eval_paths, basic_model=basic_model, batch_size=600)
  • Record Two models are trained, with batch_size=128 and batch_size=1024 respectively.

    Loss epochs First epoch (batch_size=600) First epoch (2 GPUs, batch_size=1024)
    Softmax 25 16820s 2s/step - loss: 5.2594 - accuracy: 0.4863
    Bottleneck Arcface 4 2835s 499ms/step - loss: 14.9653 - accuracy: 0.9517
    Arcface 64 65 9165s 2s/step - loss: 9.4768 - accuracy: 0.9905
    Triplet 30 8217s 2s/step - loss: 0.1169
  • Plot

    import plot
    customs = ["lfw", "agedb_30", "cfp_fp"]
    epochs = [25, 4, 35, 10, 10, 10, 10, 10]
    history = ['./checkpoints/keras_resnet101_emore_II_hist.json', './checkpoints/keras_resnet101_emore_II_triplet_hist.json']
    axes, _ = plot.hist_plot_split(history, epochs, customs=customs, fig_label='Resnet101, BS=896, label_smoothing=0.1')
    hists = ['./checkpoints/keras_ResNest101_emore_arcface_60_hist.json', './checkpoints/keras_ResNest101_emore_triplet_hist.json']
    axes, _ = plot.hist_plot_split(hists, epochs, names=["Softmax", "Bottleneck Arcface", "Arcface scale=64", "Triplet alpha=0.35", "Triplet alpha=0.3", "Triplet alpha=0.25", "Triplet alpha=0.2", "Triplet alpha=0.15"], customs=customs, axes=axes, save="", fig_label='ResNeSt101, BS=600')

Comparing early softmax training

import plot
customs = ["agedb_30"]
epochs = [15, 10]
axes, _ = plot.hist_plot_split("checkpoints/keras_mobilefacenet_256_hist_all.json", epochs, customs=customs, axes=None, fig_label='Mobilefacenet, BS=160')
axes, _ = plot.hist_plot_split("checkpoints/keras_mobile_facenet_emore_hist.json", epochs, customs=customs, axes=axes, fig_label='Mobilefacenet, BS=768')

axes, _ = plot.hist_plot_split("checkpoints/keras_se_mobile_facenet_emore_hist.json", epochs, customs=customs, axes=axes, fig_label='se_mobilefacenet, BS=680, label_smoothing=0.1')
axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_VI_hist.json', epochs, customs=customs, axes=axes, fig_label="se_mobilefacenet, Cosine, BS = 640, LS=0.1")
axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_VII_nadam_hist.json', epochs, customs=customs, axes=axes, fig_label="se_mobilefacenet, Cosine, nadam, BS = 640, nadam, LS=0.1", init_epoch=3)
axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_VIII_PR_hist.json', epochs, customs=customs, axes=axes, fig_label="new se_mobilefacenet, Cosine, center, BS = 640, nadam, LS=0.1")
axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_IX_hist.json', epochs, customs=customs, axes=axes, fig_label="new se_mobilefacenet, Cosine, no center, BS = 640, nadam, LS=0.1")
axes, _ = plot.hist_plot_split('checkpoints/keras_se_mobile_facenet_emore_X_hist.json', epochs, customs=customs, axes=axes, fig_label="new se_mobilefacenet, Cosine, center, leaky, BS = 640, nadam, LS=0.1")

axes, _ = plot.hist_plot_split("checkpoints/keras_resnet101_512_II_hist.json", epochs, customs=customs, axes=axes, fig_label='Resnet101, BS=128')
axes, _ = plot.hist_plot_split("checkpoints/keras_resnet101_emore_hist.json", epochs, customs=customs, axes=axes, fig_label='Resnet101, BS=1024')
axes, _ = plot.hist_plot_split("checkpoints/keras_resnet101_emore_II_hist.json", epochs, customs=customs, axes=axes, fig_label='Resnet101, BS=960, label_smoothing=0.1')
axes, _ = plot.hist_plot_split("checkpoints/keras_ResNest101_emore_hist.json", epochs, customs=customs, axes=axes, fig_label='Resnest101, BS=600, label_smoothing=0.1')
axes, _ = plot.hist_plot_split("checkpoints/keras_EB4_emore_hist.json", epochs, names=["Softmax", "Margin Softmax"], customs=customs, axes=axes, fig_label='EB4, BS=420, label_smoothing=0.1')

axes[0].plot((2, 15), (0.3807, 0.3807), 'k:')
axes[1].plot((2, 15), (0.9206, 0.9206), 'k:')
axes[0].plot((2, 15), (0.6199, 0.6199), 'k:')
axes[1].plot((2, 15), (0.8746, 0.8746), 'k:')

Label smoothing

  • Train schedulers
    basic_model = train.buildin_models("MobileNet", dropout=0.4, emb_shape=256)
    tt = train.Train(..., random_status=0)
    sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0), "optimizer": "nadam", "epoch": 3}]
    sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "optimizer": "nadam", "epoch": 3}]
    tt.train(sch, 0)
    sch = [{"loss": losses.ArcfaceLoss(label_smoothing=0), "epoch": 5}]
    tt.train(sch, 3)
    sch = [{"loss": losses.ArcfaceLoss(label_smoothing=0), "epoch": 3}]
    sch = [{"loss": losses.ArcfaceLoss(label_smoothing=0.1), "epoch": 3}]
    tt.train(sch, 8)
    tt = train.Train(..., random_status=3)
    sch = [{"loss": losses.ArcfaceLoss(label_smoothing=0), "epoch": 3}]
    tt.train(sch, 8)
  • Result
    import plot
    axes, _ = plot.hist_plot_split('checkpoints/keras_mobilenet_256_hist.json', [3], init_epoch=0, axes=None, fig_label="LS=0, Softmax")
    axes, pre_1 = plot.hist_plot_split('checkpoints/keras_mobilenet_ls_0.1_256_hist.json', [3, 5], names=["Softmax", "Arcface"], init_epoch=0, axes=axes, fig_label="LS=0.1, Softmax")
    axes, _ = plot.hist_plot_split('checkpoints/keras_mobilenet_arcface_ls_0_256_hist.json', [3], init_epoch=8, pre_item=pre_1, axes=axes, fig_label="LS=0, Arcface")
    axes, _ = plot.hist_plot_split('checkpoints/keras_mobilenet_arcface_ls_0.1_256_hist.json', [3], init_epoch=8, pre_item=pre_1, axes=axes, fig_label="LS=0.1, Arcface")
    axes, _ = plot.hist_plot_split('checkpoints/keras_mobilenet_arcface_randaug_256_hist.json', [3], init_epoch=8, pre_item=pre_1, axes=axes, fig_label="Random=3, LS=0, Arcface")
    axes, _ = plot.hist_plot_split('checkpoints/keras_mobilenet_arcface_randaug_ls0.1_256_hist.json', [5], names=["Arcface"], init_epoch=8, pre_item=pre_1, axes=axes, fig_label="Random=3, LS=0.1, Arcface")
    axes[2].legend(fontsize=8, loc='lower center')

Model conversion


  • Currently most frameworks support tf1.x only, so better convert it under tf1.x environment
    # '1.15.0'
    # Convert to saved model first
    import glob2
    mm = tf.keras.models.load_model(glob2.glob('./keras_mobilefacenet_256_basic_*.h5')[0], compile=False)
    tf.keras.experimental.export_saved_model(mm, './saved_model')
    tf2onnx convert saved model to tflite, also tf1.15.0
    pip install -U tf2onnx
    python -m tf2onnx.convert --saved-model ./saved_model --output model.onnx
  • MXNet to onnx
    #make sure to install onnx-1.2.1
    #pip install onnx==1.2.1
    import onnx
    assert onnx.__version__=='1.2.1'
    from mxnet.contrib import onnx as onnx_mxnet
    prefix, epoch = "model", 0
    sym_file = "%s-symbol.json" % prefix
    params_file = "%s-%04d.params" % (prefix, epoch)
    converted_model_path = onnx_mxnet.export_model(sym_file, params_file, [(1, 3, 112, 112)], np.float32, "mx_output.onnx")



  • Convert to TFlite
    # '1.15.0'
    import glob2
    converter = tf.lite.TFLiteConverter.from_keras_model_file("checkpoints/keras_se_mobile_facenet_emore_triplet_basic_agedb_30_epoch_100_0.958333.h5")
    tflite_model = converter.convert()
    open('./model.tflite', 'wb').write(tflite_model)
    # '2.1.0'
    import glob2
    mm = tf.keras.models.load_model(glob2.glob('./keras_mobilefacenet_256_basic_*.h5')[0], compile=False)
    converter = tf.lite.TFLiteConverter.from_keras_model(mm)
    tflite_model = converter.convert()
    open('./model_tf2.tflite', 'wb').write(tflite_model)
  • interpreter test
    # '2.1.0'
    import glob2
    interpreter = tf.lite.Interpreter('./model.tflite')
    input_index = interpreter.get_input_details()[0]["index"]
    output_index = interpreter.get_output_details()[0]["index"]
    def tf_imread(file_path):
        img = tf.io.read_file(file_path)
        img = tf.image.decode_jpeg(img, channels=3)
        img = tf.image.convert_image_dtype(img, tf.float32)
        img = (img - 0.5) * 2
        return tf.expand_dims(img, 0)
    imm = tf_imread('/datasets/faces_emore_112x112_folders/0/1.jpg')
    # imm = tf_imread('./temp_test/faces_emore_test/0/1.jpg')
    interpreter.set_tensor(input_index, imm)
    aa = interpreter.get_tensor(output_index)[0]
    def foo(imm):
        interpreter.set_tensor(input_index, imm)
        return interpreter.get_tensor(output_index)[0]
    %timeit -n 100 foo(imm)
    # 36.7 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    mm = tf.keras.models.load_model(glob2.glob('./keras_mobilefacenet_256_basic_*.h5')[0], compile=False)
    bb = mm(imm).numpy()
    assert np.allclose(aa, bb, rtol=1e-3)
    %timeit mm(imm).numpy()
    # 71.6 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
  • On ARM64 board
    # Architecture:        aarch64
    python --version
    # Python 3.6.9
    sudo apt install python3-pip ipython cython3
    pip install ipython
    git clone https://github.com/noahzhy/tf-aarch64.git
    cd tf-aarch64/
    pip install tensorflow-1.9.0rc0-cp36-cp36m-linux_aarch64.whl
    pip install https://dl.google.com/coral/python/tflite_runtime-2.1.0.post1-cp36-cp36m-linux_aarch64.whl
    import tensorflow as tf
    # 1.9.0-rc0
    import tflite_runtime
    # 2.1.0.post1
    import tflite_runtime.interpreter as tflite
    interpreter = tflite.Interpreter('./mobilefacenet_tf2.tflite')
    input_index = interpreter.get_input_details()[0]["index"]
    output_index = interpreter.get_output_details()[0]["index"]
    imm = tf.convert_to_tensor(np.ones([1, 112, 112, 3]), dtype=tf.float32)
    interpreter.set_tensor(input_index, imm)
    out = interpreter.get_tensor(output_index)[0]
    def foo(imm):
        interpreter.set_tensor(input_index, imm)
        return interpreter.get_tensor(output_index)[0]
    %timeit -n 100 foo(imm)
    # 42.4 ms ± 43.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit -n 100 foo(imm) # EfficientNetB0
    # 71.2 ms ± 52.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

MXNet format

  • Here uses keras-mxnet to perform conversion from Keras h5 to MXNet param + json format.
    $ pip install keras-mxnet
    $ KERAS_BACKEND='mxnet' ipython
  • Issue
    ''' Q: TypeError: tuple indices must be integers or slices, not list
    /opt/anaconda3/lib/python3.7/site-packages/keras/layers/normalization.py in build(self, input_shape)
         99     def build(self, input_shape):
    --> 100         dim = input_shape[self.axis]
        101         print(input_shape, self.axis, dim)
        102         if dim is None
    ''' A: Modify normalization.py
    $ vi /opt/anaconda3/lib/python3.7/site-packages/keras/layers/normalization.py + 97
    -       self.axis = axis
    +       self.axis = axis if isinstance(axis, int) else axis[-1]
    def build(self, input_shape):
  • Convert
    # tf save
    mm = tf.keras.models.load_model("checkpoints/keras_se_mobile_facenet_emore_triplet_basic_agedb_30_epoch_100_0.958333.h5", compile=False)
    json_config = mm.to_json()
    with open('model/model_config.json', 'w') as json_file:
    ''' Modify json file '''
    # For tf15 / tf20 saved json file, delete '"ragged": false,'
    !sed -i 's/"ragged": false, //' model/model_config.json
    # For tf-nightly saved json file, also replace '"class_name": "Functional"' by '"class_name": "Model"'
    !sed -i 's/"class_name": "Functional"/"class_name": "Model"/' model/model_config.json
    Start a new ipython session by KERAS_BACKEND='mxnet' ipython
    # mxnet load
    import numpy as np
    import keras
    # Using MXNet backend
    # from keras import backend as K
    # K.common.set_image_data_format('channels_first')
    from keras.initializers import glorot_normal, glorot_uniform
    from keras.utils import CustomObjectScope
    with open('model/model_config.json') as json_file:
        json_config = json_file.read()
    with CustomObjectScope({'GlorotNormal': glorot_normal(), "GlorotUniform": glorot_uniform()}):
        new_model = keras.models.model_from_json(json_config)
    new_model.predict(np.zeros((1, 112, 112, 3))) # MUST do a predict
    # new_model.compile(optimizer='adam', loss=keras.losses.categorical_crossentropy)
    new_model.compiled = True
    keras.models.save_mxnet_model(model=new_model, prefix='mm')
  • Test
    import numpy as np
    import mxnet as mx
    sym, arg_params, aux_params = mx.model.load_checkpoint(prefix='mm', epoch=0)
    mod = mx.mod.Module(symbol=sym, data_names=['/input_11'], context=mx.cpu(), label_names=None)
    mod.bind(for_training=False, data_shapes=[('/input_11', (1, 112, 112, 3))], label_shapes=mod._label_shapes)
    mod.set_params(arg_params, aux_params, allow_missing=True)
    data_iter = mx.io.NDArrayIter(np.ones((1, 112, 112, 3)), None, 1)

Pytorch and Caffe2

  • Caffe2 inference ONNX
    import caffe2.python.onnx.backend as onnx_caffe2_backend
    import onnx
    model = onnx.load("model.onnx")
    prepared_backend = onnx_caffe2_backend.prepare(model)
    x = torch.randn(batch_size, 1, 224, 224, requires_grad=True)
    W = {model.graph.input[0].name: x.data.numpy()}
    c2_out = prepared_backend.run(W)[0]
    %timeit prepared_backend.run(W)[0]
    # 26.4 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    import torch
    torch.save(model, 'tt')
  • Save caffe2 format
    init_net, predict_net = onnx_caffe2_backend.Caffe2Backend.onnx_graph_to_caffe2_net(model)
    with open("onnx-init.pb", "wb") as f:
    with open("onnx-predict.pb", "wb") as f:
    with open("onnx-init.pbtxt", "w") as f:
    with open("onnx-predict.pbtxt", "w") as f:
  • Caffe2 mobile format
    # extract the workspace and the model proto from the internal representation
    c2_workspace = prepared_backend.workspace
    c2_model = prepared_backend.predict_net
    # Now import the caffe2 mobile exporter
    from caffe2.python.predictor import mobile_exporter
    # call the Export to get the predict_net, init_net. These nets are needed for running things on mobile
    init_net, predict_net = mobile_exporter.Export(c2_workspace, c2_model, c2_model.external_input)
    # Let's also save the init_net and predict_net to a file that we will later use for running them on mobile
    with open('init_net.pb', "wb") as fopen:
    with open('predict_net.pb', "wb") as fopen:

