Ze-Yang/Context-Transformer

How to train model with 1 GPU?

duynn912 opened this issue · 10 comments

Dear @Ze-Yang,

Thank you for your great work!

I attempt to re-train your model with my 1 GPU. As your default setup, it is too large to be able run on my GPU because I just have 1 GPU 12GB. Therefore, I reduce the batchsize and learning rate down to 4 times and increase max iteration and stepsize up to 4 times. Here is my results when I test my model with 1 shot:
image
The results seem lower than yours about a few percentages. May you show me it is ok or not? If not, could you tell me how I can reproduce your result with my only 1 GPU.
Hope to hear from you soon!

Sincerely,
Duynn

It seems that your base-class performance is far lower than mine, I think it's abnormal. You can try to decrease the max_iter and stepsize proportionally and see if the base class performance can reach our result. I think it should exist a suitable hyperparameter that can reproduce our results. You need to explore it by yourself. Note that the linear lr adjustment strategy does not guarantee the same results, it is just an approximation. Good luck.

Dear @Ze-Yang,

Thank you very much for your answers!

I try to test your model uploaded on onedrive such as "split1_pretrain.pth" but the results for testing following the command "python test.py -d VOC --split 1 --setting incre -p 1 --save-folder weights/VOC_split1_pretrain --load-file weights/split1_pretrain.pth" as follows:
image
And the results in your paper:
image
Is it right? If so, hope you can check it?

Best regards!

Hi, I have checked again the model I uploaded. The result is 73.14% as below for split1, a little bit higher than the result in the original paper. You can ignore it as it is just due to randomness. Thanks.

AP for aeroplane = 0.7927
AP for bicycle = 0.7413
AP for boat = 0.7251
AP for bottle = 0.5741
AP for car = 0.8250
AP for cat = 0.8665
AP for chair = 0.5313
AP for diningtable = 0.7419
AP for dog = 0.8025
AP for horse = 0.7972
AP for person = 0.7929
AP for pottedplant = 0.5064
AP for sheep = 0.6713
AP for train = 0.8332
AP for tvmonitor = 0.7699
Mean AP = 0.7314

Hi @Ze-Yang,

Is it the result of Python and Could you share your annotation. I official use the labels of PASCAL but I can't get this result from your model? I only get about 43% of mAP.

It's unlikely that you get a different result with me. The annotation is exactly the official PASCAL VOC test 2007. Please check whether you modify the code and the dataset annotations as well. Thanks.

Hi @Ze-Yang,

I don't know why but I have re-downloaded your code and annotations of pascal to make sure that there is no misunderstanding and run your model again. However, the testing result was still the same just like this:
image

Hi @Ze-Yang,

I am really sorry for this inconvenience!
Actually, I don't know why I just received the lower results. I try to re-setup my environment and download the code but I cannot figure out what is going on.
If it is possible, may you send me your models (before incre and after incre) to test it again on PASCAL VOC 2007 with my environment so that I could find out my problem clearly?

Sincerely,

FYI, here's the model (link expires in 7 days) for 1shot after incre. You may need to modify the key in the state_dict since I have changed some layers' name for clarity before code release. The model before incre is exactly the one I released on onedrive. Hope that it helps.

Hi @Ze-Yang,

Thank you very much for your kind support!

I am happy that I have found my problem.
The problem is that there is a conflict between cuda and pytorch with card architecture. So I change sm-61 to sm-60 to be suitable to my card. However, pay attention that I just install my env in the first time just by pip and run your code but there is no error or warning. This is dangerous. Finally, I change my env to conda and this env warn me up with the conflict so I found my problem!

Sincerely!

Great to hear that. Actually I emphasize the check of gpu architecture in the installation section as shown below. I am not sure why pip does not prompt out the conflict but it'd better carefully follow the instructions in README.md next time. It would always save you a lot of time spending in debugging. Thanks.
image