deepinsight/insightface

Tips for training large-scale face recognition model, such as millions of IDs(classes).

nttstar opened this issue · 0 comments

For training ArcFace models by millions of IDs, we may meet some time efficiency problems.

=====
P1: There are too many classes that my GPUs can not handle.

Solutions:

  1. To reduce memory usage of the classification layer, model-parallelism and partial-fc can be the good ideas.

  2. Enable FP16 can further reduce the GPU memory usage and also get acceleration on modern NVIDIA GPUs. For example, we can enable fp16 training by a simple fp16-scale parameter:

export CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' 
python -u train_parall.py --network r50 --dataset emore --loss arcface --fp16-scale 1.0

or change the following setting in partial-fc MXNet implementation.

config.fp16 = True
  1. Use distributed training.

=====
P2. The training dataset is too huge, io cost is high which leads to very low training speed.

Solutions:

  1. Sequential data loader instead of random access.
    Right now the default face recognition datasets(*.rec) are indexed key-value databases, called MXIndexedRecordIO. So the data loader is required to randomly access the items in these datasets while doing the training. The performance is acceptable only if the data is located on ram-filesystem or very fast SSD. For general hard disks, we must use an alternative method to avoid random access.

    a. Use recognition/common/rec2shufrec.py to convert any indexed '.rec' dataset to a shuffled sequential one called MXRecordIO
    b. In ArcFace, set is_shuffled_rec=True in config file to use the converted shuffled dataset. Please check get_face_image_iter() function in image_iter.py for detail information.
    c. Shuffled dataset-loader requires sequential scanning only, and provides data shuffling in a small in-memory buffer.
    d. Shuffled dataset can also benefit from the c++ runtime of MXNet record reader which accelerates the image processing.

=====
Any question or discussion can be left in this thread.