vuvko/fitw2020

Wrong shapes of internal layers

Closed this issue · 12 comments

Hi,

I`m trying to reproduce your method, and I'm not able to start training the model.

After running train.py module, such error occurs:

mxnet.base.MXNetError: MXNetError: Error in operator pre_fc1: Shape inconsistent, Provided = [512,25088], inferred shape=(512,100352)

basically, falling is on this line

outputs = [net(X) for X in data]

I believe that can be because caused by wrong weights/model architecture, but I'm not sure, where you have taken weights for ArcFace model. I have tried to download them from the official Model-Zoo GitHub page and insightface.ai website, but both of them raises that error.

I'm totally new to MXNet and ArcFace, and would be grateful for your help!

vuvko commented

Hello!

I used the model which you can get via insightface.model_zoo.get_model('arcface_r100_v1'). Should be the same as from the website.
I believe the problem is with the image's inputs size. I used face detection and alignment as preprocessing and only passed those images where faces were detected. Such images would be 112x112.

P.S. Sorry that the repository is a mess, I will try to help with the reproduction as much as I can.

Hi,

Oh, yeah, I've noticed that you didn't use detection and alignment in the training loop, but I did not paid much attention to it. I used prepare_images method from the verification.py module to perform preprocessing, seems training is going well now.

I'll let you know if I was able to reproduce the results.

Thank you a lot!

vuvko commented

Glad to be of help.
Closing this issue.

Hi,

I'm having a similar issue too,

mxnet.base.MXNetError: Error in operator fc_classification: Shape inconsistent, Provided = [571,512], inferred shape=(570,571)

how to fix?
would be grateful for your help

vuvko commented

Be sure that your input tensor size is [batch_size, 3, 112, 112].

I print X.shape
it is 48 * 3* 112 *112
image

vuvko commented

OK. If you load already trained models (from this repository) instead of the pre-trained model from the insightface, does the error still occur?

I used the trained models which I downloaded from this link you gived

https://disk.yandex.ru/d/4AbxLjTa3fsG7g

image

vuvko commented

Oh, I get it now. You're trying to provide the full model to the training pipeline. It won't work because of additional layers.
The training pipeline is expecting the model to output a vector with 512 features.

You can try to remove the last layers from the trained model (the output of the last BatchNorm should be fine) and try again.

Yes,it is woking now.
But I am confused about what the last layer is designed for?

vuvko commented

Family classification. You can see that layer is added inside the train.py.

Thank you very much.My English is not good .Thank you for your patient.