Experimental results
lshssel opened this issue · 12 comments
Hi,
Because our group only assigned me a 2080Ti, the training took a long time, for MOWODB's task 1, it took 43 hours.
Unfortunately, on training to the 35th epoch, wandb crashes, so its curve also stops at the 35th epoch.
However, the program is still running without errors, and the file "checkpoint0040.pth" is also generated in the end, and the program can run smoothly when I use this file to train task 2.
Below are the wandb graphs and hyperparameters, which don't work very well, and I may need to tune the parameters as close to the original performance as possible.
K_AP50 is 52.476, U_R50 is 21.042
Hi @lshssel,
Hmmm batch_size 1 will be more difficult to fine-tune, but let's try.
Given your experiments, I would try:
lr=2e-5, lr_drop=60, epochs=70
The main idea is that you want the improvement to saturate and then reduce the learning rate. Training continually after the improvement has saturated doesn't help at all (U_R just goes down and AP50 doesn't go up) but if you reduce the lr_drop too soon (before AP50 starts to saturate) then the K_AP50 is 'frozen' too soon and doesn't improve enough.
Best,
Orr
Thanks for the reply, I'll try later
Hi @lshssel,
Were your results sufficiently improved?
If so, can you give me all the details re. your system/hyperparamers so I can add it for future users on the README?
Best,
Orr
one 2080Ti,for all experiments,batch_size=1
about epochs,the first value is in "main_open_world.py",and the second value is in "M_OWOD_BENCHMARK.sh".
t1.2 : lr=4e-5 lr_backbone=4e-6 epochs=51=41 lr_drop=35 K_AP50=58.36 U_R50=16.50
t1.3 : lr=2e-5 lr_backbone=4e-6 epochs=51=41 lr_drop=35 K_AP50=57.99 U_R50=19.27
t1.4 : lr=2e-5 lr_backbone=4e-6 epochs=56=46 lr_drop=40 K_AP50=57.60 U_R50=18.55
t1.6 : lr=2e-5 lr_backbone=4e-6 epochs=61=41 lr_drop=40 K_AP50=57.17 U_R50=19.34
Looking forward to your suggestions!
Hi @lshssel,
I would like to try something new with you. I had the idea that what is happening is that with different batch sizes, the objectness temperature does need to change.
Good news: no need for training. I would try to use the checkpoints t1.2, t1.3, and re-evaluate with different --obj_temp
and sweep a few values (e.g.,0.9, 1.1, 1.2) - default is 1. Should be relatively quick - as you only need to evaluate (use the --eval
flag).
Best,
Orr
I evaluated with t1.3checkpoint40 and t1.2 has been removed. When training, obj_temp = 1.3,obj_loss_coef=8e-4.
I also used obj_loss_coef with different values,but nothing has changed
obj_temp=1.1 K_AP50=57.1914 U_R50=19.2453
obj_temp=1.2 K_AP50=57.6161 U_R50=19.2581
obj_temp=1.3 K_AP50=57.9826 U_R50=19.2624 obj_loss_coef=8e-4
obj_temp=1.4 K_AP50=57.9075 U_R50=19.2367
obj_temp=1.5 K_AP50=57.8653 U_R50=19.2453
obj_loss_coef=4e-4 K_AP50=57.9826 U_R50=19.2624
obj_loss_coef=8e-4 K_AP50=57.9826 U_R50=19.2624
obj_loss_coef=1.6e-3 K_AP50=57.9826 U_R50=19.2624
obj_loss_coef=4e-3 K_AP50=57.9826 U_R50=19.2624
So t1.3 is probably the best result that a 2080Ti can show
Hi @lshssel,
I want to ensure you understand you don't need to train with a different obj_temp -- you can change this just for evaluation. Unfortunately, it does seem that this is the best result with batch_size=1. Perhaps we could improve it a little more, but probably not much.
I want to add this to the readme. Would you mind providing all the hyperparameters you changed?
Hi,
Yes, I understand what you mean, I use different obj_temp values for evaluation.
As mentioned earlier, changing the value of the obj_temp did not improve performance.
If batch size=2, then cuda out of memory, so it can only be 1 on a 2080Ti(11G).
My hyperparameter is set to:
lr=2e-5 lr_backbone=4e-6 batch size=1, nothing else has changed.
Thank you again for your excellent work and answering my questions.
Hello, I also used a 2080Ti card to complete the entire experiment. I conducted the experiment according to "lr=2e-5, lr'backbone=4e-6, batch size=1, obj_temp=1.3". My results are shown in the following figure. I don't know why some results are actually higher than the results mentioned in the paper. By the way, it took me about 8 days to complete the entire experiment
Hi @WangPingA,
When you train a model with a different batch size, your results will vary. That is because your gradient updates will not be the same, as you use a different batch size. Variations of +-2 seem reasonable.
lshssel also ran experiments with a 2080Ti, and got:
If you are interested in applications, then perhaps my recent work, FOMO, will interest you; it is much less compute-heavy to train and will have relatively strong open-world performance by leveraging foundation object detection model. An easy upgrade there is to switch owl-vit to owlv2.
Best,
Orr