MarekKowalski/DeepAlignmentNetwork

Can not reach the performance reported on paper

hzh8311 opened this issue · 14 comments

I get same result as paper except challenging set and failure rate. Did I missed something?

Hi,

Are you using the pretrained models? Did you look at the setting of normalization = 'corners' as opposed to normalization = 'centers'?
Also, what results are you getting exactly?

Marek

Hi,

I re-train the DAN from scratch, and got the results below, corners' distance is used.

Common Challenging Full AUC Failure rate
Reimp 3.15 5.58 3.62 55.14 1.74
Paper 3.19 5.24 3.59 55.33 1.16

Hi,

The performance on the challenging dataset and failure rate do seem to be lower.

When you trained the first stage did you train till the validation error stopped decreasing or did you stop early?
Also, in the original implementation (what we had before we cleaned it up for github publication) the validation set was chosen at random, as opposed to just taking the first 100 images, which happens on the github version.
This could make a difference if it turned out the first 100 images are "easier" than the average sample.

Did you try training it several times to see if those results are stable?

Marek

Hi,

I think that is the reason. The first 100 images are all from Common Set.

Is it convenient for you to share the random validation code?

Hi,

I don't have that code anymore. After thinking about it for a while I find it a bit unlikely that the validation set caused such a large difference (6%) on the challenging subset. I think the validation set may cause a difference of 1-2% but not 6%.

Did you try stopping early in the training of the first stage? I vaguely remember trying something like that to reduce overfitting.

Marek

Hi,
Did you mean should stop early in the first stage? I stop at approximately 70th epoch for validation errors stable for 10-20 epoch.
I reconsidered the validation problem and agree with you. Because I saved the models every epoch, it should not be such a gap for some best epochs.
Any ideas of this problem?

Hi,

Yes, as I said I remember trying to stop early on the first stage, since I felt that the gains on the validation set were very small compared to how the model was overfitting on the training data.
See if that helps.

Also, just to make sure: you first train the first stage only and then the second stage, right?

Marek

Hi,
Yes, as the paper said, and first stage get excately same result on Private set.
I will try to stop early on the first stage.

@hzh8311 Hi, how many epochs you trained for stage 2?

hi,Marek
can you give me some suggestions for improve the performance?
Now,I trained the stage1 from scratch.The train data and augumention is same as you.I test my model and make compare with your result.

stage1
normalization = 'corners'
failureThreshold = 0.08

author(Marek):
commonSet:
Average error: 0.0357
challengingSet
Average error: 0.0597
300w:
Average error: 0.0501

my:
commonSet:
Average error: 0.0387
challengingSet
Average error: 0.0694
300w:
Average error: 0.0547

Thanks,hope your reply

Hi,

This is quite a large difference.
Which pretrained model are you using for comparison? It should be the one that does not have Menpo in the name, unless you are using the Menpo dataset for training.
Also, are you comparing the 1 stage model to the first stage of a pretrained model or to a 2 stage pretrained model?

Marek

1 yes,the model DAN.npz,not Menpo-DAN.npz;my training data does not contains Menpo dataset.
2 just 1 stage, not 2 stage.because my model just contains 1 stage.I will continue to trian the stage2 after improve the performance of stage1.
3 all value in the result i provided does not shown as percentage of the normoalization metric.

Hi,

It's really hard for me to say why you are getting those low results. As you can see in this thread above someone did experience a difference as well, but not that large.

Let me know if I can help you further.

Marek

hi,Marek
thanks for your reply.
I have trianed stage1 again after modifying some errors in my code which based on caffe.my new results as below:
stage1
normalization = 'corners'
failureThreshold = 0.08

author(Marek):
commonSet:
Average error: 0.0357
challengingSet
Average error: 0.0597
300w:
Average error: 0.0501

my:
commonSet:
Average error: 0.0362
challengingSet
Average error: 0.0626
300w:
Average error: 0.0513

1 during my training,i did not use early stop,is it a key element for the result?