Tips for training large-scale face recognition model, such as millions of IDs(classes).

Question

nttstar opened this issue 3 years ago · 0 comments

For training ArcFace models by millions of IDs, we may meet some time efficiency problems.

=====
P1: There are too many classes that my GPUs can not handle.

Solutions:

To reduce memory usage of the classification layer, model-parallelism and partial-fc can be the good ideas.
Enable FP16 can further reduce the GPU memory usage and also get acceleration on modern NVIDIA GPUs. For example, we can enable fp16 training by a simple fp16-scale parameter:

export CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' 
python -u train_parall.py --network r50 --dataset emore --loss arcface --fp16-scale 1.0

or change the following setting in partial-fc MXNet implementation.

config.fp16 = True

=====
P2. The training dataset is too huge, io cost is high which leads to very low training speed.

Solutions:

=====
Any question or discussion can be left in this thread.