deepinsight/insightface

Report your verification accuracy of new training dataset 'insightv2_emore'.

nttstar opened this issue · 71 comments

A new training dataset 'insightv2'(code name emore) (still largely based on ms1m) is available at baiducloud and onedrive (from @gdshen ) which can achieve a better accuracy easily. I hope anybody who uses insightface can post your training accuracy and detail here to show the strength of our network backbone, dataset and loss function. (Currently I will not provide pretrained models)
(more training dataset contribution is welcome, email me pls~)

The format may like below(take from one of my experiments):

  1. dataset: emore (or faces_ms1m for the old one)
  2. network backbone: r50 ( res_unit=3, output=E, emb_size=512, prelu )
  3. loss function: arcface(m=0.5)
  4. training pipeline: straightforward (lr drop at 100K, 140K, 160K), batch-size:512
  5. Highest LFW: 99.83%; Highest CFP_FP: 97.67; Highest AgeDB30: 98.10

OK, I will try it as soon as you release new dataset

Excuse me,what should I do if I want to make a new train.rec and train.idx by myself. I can't find these code about #make_list(args) in face2rec2.py.

do you have plan to release the images than rec file?

@HaoLiuHust no such plan.

能否换个网盘,或者其他方式,百度网盘没开超级会员下载会限速

xxllp commented

能说下这个新数据包含点啥吗

What is the difference between this dataset and msceleb? cleaner?

  1. dataset: emore
  2. network backbone: MobileFaceNet(y1)
  3. loss function: arcface
  4. training pipeline: batch-size:320
  5. Highest LFW: 99.35%; Highest CFP_FP: 91.27%; Highest AgeDB30: 94.43%; the acc of trainning is stay around 0.20 to 0.21

I just have two GPU of 1070ti, so my batch-size can't be larger than about 350, my train command is:

CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py --network y1 --loss-type 4 --margin-s 64.0 --margin-m 0.5 --per-batch-size 160 --emb-size 128 --ckpt 2 --data-dir ../datasets/faces_emore --wd 0.00004 --fc7-wd-mult 10.0 --lr 0.01 --prefix ../mobile/model-mobilefacenet0001

The final argument list:

(batch_size=320, beta=1000.0, beta_freeze=0, beta_min=5.0, bn_mom=0.9, ckpt=2, ctx_num=2, cutoff=0, data_dir='../datasets/faces_emore', easy_margin=0, emb_size=128, end_epoch=100000, fc7_wd_mult=10.0, gamma=0.12, image_channel=3, image_h=112, image_w=112, loss_type=4, lr=0.01, lr_steps='', margin=4, margin_a=1.0, margin_b=0.0, margin_m=0.5, margin_s=64.0, max_steps=0, mom=0.9, network='y1', num_classes=85742, num_layers=1, per_batch_size=160, power=1.0, prefix='../fine_turn_0001/model-mobilefacenet0001-fine', pretrained='', rand_mirror=1, rescale_threshold=0, scale=0.9993, target='lfw,cfp_fp,agedb_30', use_deformable=0, verbose=2000, version_act='prelu', version_input=1, version_output='E', version_se=0, version_unit=3, wd=4e-05)

Which argument can I change to improve the result?

@Wisgon you may try to change learn rate. For example, set --lr 0.1 --lr-step 55000,85000,100000,110000

Thank you , I will try it later @ShiyangZhang

How can I get the image list? I'm using another face detection and align method that is different from mtcnn. Thanks!

  1. dataset: emore
  2. network backbone: r34 ( res_unit=3, output=E, emb_size=256, prelu)
  3. loss function: arcface(m=0.5)
  4. training pipeline: straightforward (lr drop at 100K, 125K, 150K), batch-size:1024
  5. Highest LFW: 99.817%; Highest CFP_FP: 97.371; Highest AgeDB30: 97.867

@nttstar
Could you share more detailed information about this new dataset ? I only get little information:
dataset: emore
it includes 85742 identities which more than previous 85164.
it includes 5822653 images which more than previous 3804846.
As you mentioned, this emore dataset still largely based on ms1m. Only increase 578 identities, images increase more than 2 million.
Did you consider the data balance when you collect, clean and tide up those dataset ? Such as remove identities which has very little images or very large images.

@yangfly Did you use alignment? and how much the alignment contributes to the final accuracy? Also, you used a large batch size - 1024, does this also contribute to the accuracy, when compared to smaller batch size like 50, or 100?

@xmuszq

  • No extra alignment is used. Accuracy is gained by --target in train_softmax.py
  • Highest Accuray is actually evaluated using one of the final model.
  • I found that batch_size=1024 was slightly more accurate than batch_size=512.

@nttstar
Could kindly share your trained model on emore?
thanks a lot

I think BaiduYun is not a good place to share such a dataset. For some reason, baidu will block some ips, for example, one of my ip is in taiwan education net, so it(baiduyun) will give me a 404.

dataset: emore
network backbone: r100 ( res_unit=3, output=E, emb_size=512, prelu)
loss function: cosine(m=0.35)
training pipeline: straightforward , batch-size:128*8
Highest LFW: 99.8%+; Highest CFP_FP: 98%+; Highest AgeDB30: 98%+

dataset: eMS1M (emore)
network backbone: y2 (a simplified vision of MobileFaceNet, which has 0.74M params and 94M M-Adds)
loss function: arcface (m=0.5, s=64)
input_size: 112*112
batch_size: 512
training pipeline:
Stage 1: pre-training with m=0.1. lr keep 0.1, end when acc plateaus.
Stage 2: normal-training with m=0.5. lr starts from 0.1 and is divided by 10 when the acc plateaus.
Highest LFW: 99.55%
Highest CFP_FP: 92.23%
Highest AgeDB30: 95.20%

I have uploaded the datasets using OneDrive, hope it helps.

dataset: emore
network backbone: Resnet 152
loss function: arcface (m=0.5, s=64)
input_size: 112*112
batch_size: 512

call reset()
testing verification..
(12000, 512)
infer time 20.323823
[lfw][307045]XNorm: 22.521362
[lfw][307045]Accuracy-Flip: 0.99800+-0.00245
testing verification..
(14000, 512)
infer time 22.382543
[cfp_ff][307045]XNorm: 20.382610
[cfp_ff][307045]Accuracy-Flip: 0.99857+-0.00181
testing verification..
(14000, 512)
infer time 22.360029
[cfp_fp][307045]XNorm: 22.274495
[cfp_fp][307045]Accuracy-Flip: 0.98114+-0.00657
testing verification..
(12000, 512)
infer time 18.985229
[agedb_30][307045]XNorm: 22.957849
[agedb_30][307045]Accuracy-Flip: 0.98300+-0.00733

  1. dataset: emore
  2. network backbone: y1 ( emb_size=512, prelu )
  3. loss function: arcface(m=0.5)
  4. training pipeline: [40000, 54000, 60000] batch-size:1024
  5. Highest LFW:0.99383; Highest CFP_FP: 0.91871; Highest AgeDB30: 0.95783

how to compute the Highest LFW? it is added with variance?

**now we get more higher accuray using my modified mobilenet network

[lfw][12000]Accuracy-Flip: 0.99617+-0.00358
[agedb_30][12000]Accuracy-Flip: 0.96017+-0.00893 .**

@meanmee 百度网盘有破解版本,不限制速度的。

dataset: eMS1M (emore)
network backbone: private CNN, which has 0.75M params (3/4 of y1) and 57M M-Adds (1/4 of y1)
loss function: arcface (m=0.5, s=64)
input_size: 96x80
batch_size: 512
training pipeline:

  • Stage 1: pre-training with m=0.1. lr 0.1 keep [5W] iters
  • Stage 2: norm-training with m=0.5. lr 0.1 and is divided by 10 at [25W 30W 34W 37W 40W] iters

result:

  • Highest LFW: 99.42%
  • Highest CFP_FP: 91.39%
  • Highest AgeDB30: 95.10%
  1. dataset: EMore
  2. network backbone: r100
  3. loss function: arcface (m=0.5, scale=64)
  4. input size: 112x112
  5. batch size: 2024(32 GPUs)
  6. lr: 0.4
  7. lr steps: 37500, 52500, 60000
  8. max steps: 75000
  9. One megaface: 97.43
  10. One LFW: 99.8; One AGEDB_30: 98.13; One CFP_FP: 95.37, One CFP_FF:99.93
  1. dataset: emore
  2. network backbone: r50 ( res_unit=3, output=E, emb_size=512, prelu )
  3. loss function: arcface(m=0.5)
  4. training pipeline: straightforward (lr drop at 66666, 93333, 106666), batch-size:768
  5. Highest LFW: 99.817; Highest CFP_FP: 97.971; Highest AgeDB30: 98.067
  1. dataset: EMore
  2. network backbone: c116 ( res_unit=3, output=E, emb_size=256, prelu ), CRU_Net
  3. loss function: arcface(m=0.5)
  4. aug_color = 0.1 for data color augmentation
  5. input size: 112x112
  6. batch size: 448(4 GPUs)
  7. lr: 0.1
  8. lr steps: [114285, 228571, 457142, 571428, 800000]
  9. LFW: 99.850; CFP_FP: 98.000; CFP_FF: 99.757; CALFW: 95.983; CPLFW:93.150; AgeDB30: 98.067; vgg2_fp: 95.480, all @epoch 24

@bruinxiong Greate job! But what's the network backbone c116, is it a new network not in this project?

@bruinxiong what is vgg2_fp?

@bruinxiong Can I know what you mean by aug_color = 0.1? What kind of color augmentation do you do?

dataset: emore
network backbone: r50 ( emb_size=512, prelu )
loss function: arcface(m=0.5)
training pipeline: I forgot... batch-size:324
Highest LFW:0.99817; Highest CFP_FP: 0.96629; Highest AgeDB30: 0.97627

@nttstar the training accuracy is only 0.6, but according to your r50 log, your model converges to comparable LFW result when the training accuracy reaches 0.82 on Refined-MS1M dataset. Does it mean emore is much more difficult than Refined-MS1M?

I'm looking to train a mobilefacenet backbone: y1 or y2 and am finding that my inference times are on par with pre-trained model, but models size isn't significantly smaller.

Highest Perf:

  • y1: LFW: 0.99333, CFP_FP: 0.92357, AgeDB30: 0.94333
  • y2: LFW: 0.99467, CFP_FP: 0.92086, AgeDB30: 0.95817

Model size:

  • y1: 46MB
  • y2: 173MB
  • r34: 131MB
  • r50: 167MB

Inference times for 112x112 image running on AWS p3.2xlarge instance:

  • y1: 37ms
  • y2: 35ms
  • r34: 39ms
  • r50: 43ms

I was hoping to see model size of 4MB and inference of <20ms as per the paper: https://arxiv.org/abs/1804.07573

@YaqiLYU @Wisgon @blessxu For the mobilefacenet backbone, what sort of model size & inference times are you seeing?

@brightsparc after remove the softmax layer, you will get 4M model

@HaoLiuHust can you elaborate on removing the softmax layer?

I was planning on using this model to produce an embedding from an aligned input face, and perform a dot product against a set of known faces to find the closet match, as opposed to training a new multi-class model on this dataset.

To get the 4M model, do I need to tweak the symbols src/symbols/fmobilenetv2.py, can you provide me an example?

@brightsparc you have two ways to do this,

  1. remove params of 'fc7' layer
    sym,arg_params,aux_params=mx.model.load_checkpoint("/home/liuhao/Projects/face_mxnet/mobilefacergb/ms_vgg",8)

mod=mx.module.Module(symbol=sym,context=mx.cpu(),label_names=None)
mod.bind(data_shapes=[('data',(1,3,112,112))],label_shapes=None,for_training=False)
new_args = dict({k: arg_params[k] for k in arg_params if 'fc7' not in k})
mod.set_params(arg_params=new_args,aux_params=aux_params,allow_missing=True)
mod.save_checkpoint("/home/liuhao/Projects/face_mxnet/mobilefacergb/ms_vgg_deploy",0)

  1. remove 'fc7' from symbol
    sym,arg_params,aux_params=mx.model.load_checkpoint("/home/liuhao/Projects/face_mxnet/mobilefacergb/ms_vgg",8)

all_layers = sym.get_internals()
net = all_layers['fc1_output']
mod=mx.module.Module(symbol=net,context=mx.cpu(),label_names=None)
mod.bind(data_shapes=[('data',(1,3,112,112))],label_shapes=None,for_training=False)
mod.set_params(arg_params=new_args,aux_params=aux_params)

mod.save_checkpoint("/home/liuhao/Projects/face_mxnet/mobilefacergb/ms_vgg_deploy",0)

@brightsparc Had you trained a mobilefacenet model that is 46M? If you had finished training, you can had a xxxx-0066.params file and xxxx.json file, the number 0066 depends on how many times do you had save model through your training. Then, cd to the "deploy" dir, you can see the file "model_slim.py" then use command "$python model_slim.py --model ../your_model_dir/xxxx,66" (xxxx is the name of your model), then you can find a 4M model in your_model_dir.

Thanks @HaoLiuHust

Yes I have @Wisgon. Thanks I will try this.

dataset: EMore
network backbone: c116 ( res_unit=3, output=E, emb_size=256, prelu ), CRU_Net
loss function: arcface(m=0.5)
aug_color = 0.1 for data color augmentation
random_crop = 0.9
input size: 112x112
batch size: 80(8 GPUs)
lr: 0.1
lr steps: [80000, 160000, 320000, 400000, 560000]
LFW: 0.99867+-0.00180; CFP_FP: 0.98386+-0.00542; CFP_FF: 0.99729+-0.00216; CALFW: 0.95933+-0.01119; CPLFW:0.92833+-0.01624; AgeDB30: 0.97983+-0.00765; all @epoch 36

大家在训练时,用arcface loss之前,还要用softmax先预训练么?

@bruinxiong can you share your implementation for c116?

@bruinxiong how about your training speed?

dataset: emore (or faces_ms1m for the old one)
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], batch-size:256, 4gpu
on epoch 21:
one LFW: 99.817; CFP_FP: 98.17; AgeDB30: 98.06

We plan to release CRUNet 116 modified architecture to support this community. If you are interested
, please pay attention to this link https://github.com/bruinxiong/Modified-CRUNet-and-Residual-Attention-Network.mxnet. Thanks!

@bruinxiong May I ask what kind of GPU you are using to train this model?

dataset: EMore
network backbone: c116 ( res_unit=3, output=E, emb_size=256, prelu ), CRU_Net
loss function: arcface(m=0.5)
aug_color = 0.1 for data color augmentation
random_crop = 0.9
input size: 112x112
batch size: 80(8 GPUs)
lr: 0.1
lr steps: [80000, 160000, 320000, 400000, 560000]
LFW: 0.99867+-0.00180; CFP_FP: 0.98386+-0.00542; CFP_FF: 0.99729+-0.00216; CALFW: 0.95933+-0.01119; CPLFW:0.92833+-0.01624; AgeDB30: 0.97983+-0.00765; all @epoch 36

@xmuszq We use 8 Nvidia Titan X with Pascal GPU architecture and 12G DDR5.

@bruinxiong Hi, did you figure out the difference between ms1m and emore?

dataset: emore (or faces_ms1m for the old one)
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], batch-size:256, 4gpu
on epoch 21:
one LFW: 99.817; CFP_FP: 98.17; AgeDB30: 98.06

@nttstar @alvenchen can you tell me how are you adjust the lr_step, i come true it use tensorflow , but i can not get a good result, i think the porbelm may be lr_step?

dataset: emore
network backbone: mobilefacenet + GNAP block
loss function: arcface(m=0.5)
training pipeline: finetune (lr drop at 100K, 140K, 160K), batch-size:512
one epoch 52: LFW-99.60% CFP-FP-93.46%, AgeDB-95.45%

dataset: emore
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], end with 180001, batch-size:256, 4gpu
then retrain with lr = 0.01, lr_steps[200000, 300000, 400000]
one LFW: 99.82; CFP_FP: 98.50; AgeDB30: 98.25

dataset: emore
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], end with 180001, batch-size:256, 4gpu
then retrain with lr = 0.01, lr_steps[200000, 300000, 400000]
one LFW: 99.82; CFP_FP: 98.50; AgeDB30: 98.25

How about on MegaFace?

dataset: emore
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], end with 180001, batch-size:256, 4gpu
then retrain with lr = 0.01, lr_steps[200000, 300000, 400000]
one LFW: 99.82; CFP_FP: 98.50; AgeDB30: 98.25

@stupiding can you tell me how to debug lr_steps? By looking at what indicators to debug?

dataset: emore
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], end with 180001, batch-size:256, 4gpu
then retrain with lr = 0.01, lr_steps[200000, 300000, 400000]
one LFW: 99.82; CFP_FP: 98.50; AgeDB30: 98.25

How about on MegaFace?

I have no Megaface account, so it is not tested for now

dataset: emore
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], end with 180001, batch-size:256, 4gpu
then retrain with lr = 0.01, lr_steps[200000, 300000, 400000]
one LFW: 99.82; CFP_FP: 98.50; AgeDB30: 98.25

@stupiding can you tell me how to debug lr_steps? By looking at what indicators to debug?

@sharonjunjun Don't understand your meaning by "debug lr_steps", I just adjust lr with SGD when the loss plateau

dataset: emore
network backbone: r100 ( res_unit=3, output=E, emb_size=512, prelu)
loss function: cosine(m=0.35)
training pipeline: straightforward , batch-size:128*8
Highest LFW: 99.8%+; Highest CFP_FP: 98%+; Highest AgeDB30: 98%+

@meanmee ”training pipeline: straightforward“ your lr schedule is only 0.1? or you change lr in different
global step?

  1. dataset: EMore
  2. network backbone: r100
  3. loss function: arcface (m=0.5, scale=64)
  4. input size: 112x112
  5. batch size: 2024(32 GPUs)
  6. lr: 0.4
  7. lr steps: 37500, 52500, 60000
  8. max steps: 75000
  9. One megaface: 97.43
  10. One LFW: 99.8; One AGEDB_30: 98.13; One CFP_FP: 95.37, One CFP_FF:99.93

@yoookoo did your acc reach 1 in the training process ? If so , did you clean EMore yourself before training ?

dataset: EMore
network backbone: mobilefacenet res-4-8-16-8
Model size:30.1M
loss function: arcface (m=0.5, scale=64)
input size: 112x112
LFW: 99.850
AGEDB_30: 98.167
CFP_FP: 97.729
Megaface: 96.8392

twmht commented

@LicheeX

Can you share the architecture?

twmht commented

@LicheeX

what do you mean res-4-8-16-8?

@nttstar @LicheeX can you report your training accuracy?

@LicheeX can you share the training log and model?

dataset: eMS1M (emore)
network backbone: y2 (a simplified vision of MobileFaceNet, which has 0.74M params and 94M M-Adds)
loss function: arcface (m=0.5, s=64)
input_size: 112*112
batch_size: 512
training pipeline:
Stage 1: pre-training with m=0.1. lr keep 0.1, end when acc plateaus.
Stage 2: normal-training with m=0.5. lr starts from 0.1 and is divided by 10 when the acc plateaus.
Highest LFW: 99.55%
Highest CFP_FP: 92.23%
Highest AgeDB30: 95.20%

can you tell me where is y2 network? is the y2 network designed by you? I can't find it in author's source code, really appreciate it.

dataset: eMS1M (emore)
network backbone: y2 (a simplified vision of MobileFaceNet, which has 0.74M params and 94M M-Adds)
loss function: arcface (m=0.5, s=64)
input_size: 112*112
batch_size: 512
training pipeline:
Stage 1: pre-training with m=0.1. lr keep 0.1, end when acc plateaus.
Stage 2: normal-training with m=0.5. lr starts from 0.1 and is divided by 10 when the acc plateaus.
Highest LFW: 99.55%
Highest CFP_FP: 92.23%
Highest AgeDB30: 95.20%

can you tell me where is y2 network? is the y2 network designed by you? I can't find it in author's source code, really appreciate it.

https://github.com/deepinsight/insightface/blob/master/recognition/symbol/fmobilefacenet.py

dataset: emore
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], end with 180001, batch-size:256, 4gpu
then retrain with lr = 0.01, lr_steps[200000, 300000, 400000]
one LFW: 99.82; CFP_FP: 98.50; AgeDB30: 98.25

Before you retrain ,what`s your lr(lr=0.1)?

Does this emore have overlap with tested datasets? It seems to perform abnormally well on LFW and CFP FF, which suggests overlap with frontal celebrities.
It doesn't perform any different to cleaned ms1m against private testing sets.

clhne commented

dataset: EMore
network backbone: mobilefacenet res-4-8-16-8
Model size:30.1M
loss function: arcface (m=0.5, scale=64)
input size: 112x112
LFW: 99.850
AGEDB_30: 98.167
CFP_FP: 97.729
Megaface: 96.8392

Which GPU you use, and how many GPU?
Batch size ?
Training time cost?

dataset: EMore
network backbone: c116 ( res_unit=3, output=E, emb_size=256, prelu ), CRU_Net
loss function: arcface(m=0.5)
aug_color = 0.1 for data color augmentation
random_crop = 0.9
input size: 112x112
batch size: 80(8 GPUs)
lr: 0.1
lr steps: [80000, 160000, 320000, 400000, 560000]
LFW: 0.99867+-0.00180; CFP_FP: 0.98386+-0.00542; CFP_FF: 0.99729+-0.00216; CALFW: 0.95933+-0.01119; CPLFW:0.92833+-0.01624; AgeDB30: 0.97983+-0.00765; all @epoch 36

Did you test on megaface?

dataset: EMore
network backbone: mobilefacenet res-4-8-16-8
Model size:30.1M
loss function: arcface (m=0.5, scale=64)
input size: 112x112
LFW: 99.850
AGEDB_30: 98.167
CFP_FP: 97.729
Megaface: 96.8392

Hi, what embedding setting do you use? GDC or GNAP or E?
Thank U!

dataset: emore (or faces_ms1m for the old one)
network backbone: r100 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
lr_steps [105000, 125000, 150000], batch-size:256, 4gpu
on epoch 21:
one LFW: 99.817; CFP_FP: 98.17; AgeDB30: 98.06

Hi,do your lr is 0.1? Thanks.