Training from Scratch

Question

Training from Scratch

SURABHI-GUPTA opened this issue 4 years ago · 2 comments

@pterhoer @jankolf Thanks for such a wonderful paper.

I have some queries related to the implementation part.

Firstly, as I understood there is no specific training for the SER-FIQ model. Basically, we need a pre-trained MTCNN and pre-trained face recognition model, say Arcface+Resnet100 that can be trained from scratch on my own dataset also. Just need to pass these as a parameter to your serfiq_examply.py file. Am I correct?

Or in addition to the above two parameters, we need pre-trained SER-FIQ model weights also. What does ./data/pre_fc1_bias.npy stores? How can I train this on my own dataset?

Answer 1 · 2021-01-22T17:12:52.000Z

Hi @SURABHI-GUPTA , just my 2 cents. From the paper, the network used has 4 layers.

The structure of SER-FIQ (on-top model) was optimized such that its produced embeddings achieve a similar EER on ColorFeret as that of the FaceNet embeddings. It con- sist of five layers with nemb/128/512/nemb/nids dimen- sions. The two intermediate layers have 128 and 512 di- mensions. The last layer has the dimension equal to the number of training identities nids and is only needed during training. All layers contain dropout [36] with the recom- mended dropout probability pd = 0.5 and a tanh activation. The training of the small custom network is done using the AdaDelta optimizer [44] with a batchsize of 1024 over 100 epochs. Since the size of the in- and output layers (blue and green) of the networks differs dependent on the used face embeddings, a learning rate of αF N = 10-1 was chosen for FaceNet and αAF = 10-4 for the higher dimensional Ar- cFace embeddings. As the loss function, we used a simple binary cross-entropy loss on the classification of the training identities.

If you decide train on your custom dataset, thanks for letting me know. We can colloborate

Answer 2 · 2021-01-23T16:59:58.000Z

Hi @SURABHI-GUPTA, @manisoftwartist,
as described by @manisoftwartist, you can train an embedding network on your own and then train an on-top model of SER-FIQ based on your training set.

But as you, @SURABHI-GUPTA, want to train the model on your own data, you have several possibilities.
First, you can use the models we haves used from the (old) insightface repository specified here and train a new model from scratch with your training data. If you use the same model structure as in the insightface repository (with the same names for the layers) and your model files have the prefix "model" you can use our code directly.

To help you with the implementation I would like to explain how our re-implementation of insightface/Arcface works.
We have written a wrapper class in face_image_quality.py called InsightFace.
In this class we are loading the mxnet model files from a model named "model", which should be located in
"{insightface_path}/models/" folder.
Then we are extracting/referencing the last Dropout-Layer in the network. In Arcface models, this layer is called "dropout0_output". The embedding output is called "fc1_output". We are combining both layers to an output of the mxnet model.
When we pass now an image through the Arcface model, we get as an output of the Dropout layer as well as the embedding itself (code).
Because we want to pass the Dropout embedding T times through the Dropout layer, we have extracted the layer weights which are connecting the Dropout layer with the output layer from the mxnet model and saved them as a numpy file. We are then loading these weights to reconstruct the last layers of the Arcface model in Tensorflow. One file contains the weights itself, the other file contains the bias values for the dense layer.

If you change the structure of your model or if you are using another framework, you need to get the output of the Dropout layer of the model you are using as well as the weights of the last layer(s).
You can either create a a new network (e.g. in Tensorflow as we did) and load the weights in a re-created model or you modify your network. This is dependent on what is possible in the framework you are using.
Then you pass the Dropout output T times through the Dropout-Dense Layer combination and save the results to calculate the SER-FIQ scores.

I hope this answers your questions. Please do not hesitate to ask if something is unclear.

Best,
Jan