Tips for training large-scale face recognition model, such as millions of IDs(classes).
nttstar opened this issue · 0 comments
For training ArcFace models by millions of IDs, we may meet some time efficiency problems.
=====
P1: There are too many classes that my GPUs can not handle.
Solutions:
-
To reduce memory usage of the classification layer, model-parallelism and partial-fc can be the good ideas.
-
Enable FP16 can further reduce the GPU memory usage and also get acceleration on modern NVIDIA GPUs. For example, we can enable fp16 training by a simple
fp16-scale
parameter:
export CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7'
python -u train_parall.py --network r50 --dataset emore --loss arcface --fp16-scale 1.0
or change the following setting in partial-fc MXNet implementation.
config.fp16 = True
- Use distributed training.
=====
P2. The training dataset is too huge, io cost is high which leads to very low training speed.
Solutions:
-
Sequential data loader instead of random access.
Right now the default face recognition datasets(*.rec) are indexed key-value databases, calledMXIndexedRecordIO
. So the data loader is required to randomly access the items in these datasets while doing the training. The performance is acceptable only if the data is located on ram-filesystem or very fast SSD. For general hard disks, we must use an alternative method to avoid random access.a. Use recognition/common/rec2shufrec.py to convert any indexed '.rec' dataset to a shuffled sequential one called
MXRecordIO
b. In ArcFace, setis_shuffled_rec=True
in config file to use the converted shuffled dataset. Please checkget_face_image_iter()
function inimage_iter.py
for detail information.
c. Shuffled dataset-loader requires sequential scanning only, and provides data shuffling in a small in-memory buffer.
d. Shuffled dataset can also benefit from the c++ runtime of MXNet record reader which accelerates the image processing.
=====
Any question or discussion can be left in this thread.