Model didn't converge.

Question

Model didn't converge.

rAm1n opened this issue 7 years ago · 7 comments

Thanks for sharing your code. It seems clean and well-written, however, I had problem having it converge.

I trained it on filtered version of MsCeleb with 5 Million images and 79K identities. Your hyper-parameters seems to be identical with the Tensorflow implementation davidsandberg/facenet and I also tried different ones but I never got more than 65% accuracy on LFW.

I think it's mostly because of the way that triplet selection has been implemented. The paper suggests having batches of 1800 images from a certain number of identities (40-45), rather than choosing it completely randomly. I tried this but only with 180 images at most, yet still it didn't converge.

Do you have any idea that can help me? If you had any success training the model, could you please share your weights too?

Thanks,

Answer 1 · 2018-02-02T11:20:01.000Z

Dear @rAm1n,
Notice that the base CNN model of this repository is ResNet18, but the TensorFlow version used the Inception-ResNet-V1.
About the triplet selection issue, which I also wonder to learn how one can train models via it, maybe the below links would help you:
A PyTorch Implementation for Triplet Networks

Answer 2 · 2018-02-03T01:57:22.000Z

Hi @ahkarami

Thanks for for pointing out the issue with ResNet version. I am aware of it but, unfortunately I had no luck getting any number better than 65% on LFW. Regardless of the encoder network, something around 90+ is definitely achievable with triplet loss.

I think the link that you shared is an implementation of this paper which is a bit different with FaceNet. I've stop working on it for a short while but I recommend this paper to you:

How to Train Triplet Networks with 100K Identities?

also, if you are really interested about embeddings and solving face verification with open-set configuration, make sure to have a look on recent works based on angular loss: insightface, sphereface

Answer 3 · 2018-02-04T08:16:48.000Z

Dear @rAm1n,
Thank you very much for your complete answer.

Answer 4 · 2018-08-31T16:23:36.000Z

Hi @rAm1n,
Did you find a way to get a better accuracy with LFW ? I am also stuck at 67%.

Answer 5 · 2018-08-31T16:39:41.000Z

Hi @magwyz

I didn't really continue working on this. If you really want to make this work, maybe start with a Softmax version and then fine-tune using triplet-loss. Also, re-implementing the triplet selection from the tensorflow repository might help. And don't forget to play with the learning rate too. I would guess it will take time too converge and most probably the loss will drop rapidly after few hours of training.

Answer 6 · 2018-08-31T16:59:24.000Z

Thanks @rAm1n for the hints!

Answer 7 · 2018-10-29T08:38:22.000Z

Hi rAm1n,

https://github.com/tbmoon/facenet

I achieved an 90% accuracy on LFW dataset. If you are interested in my codes, don't hesitate to refer to it.