lgpma train slow

Question

lgpma train slow

Opened this issue 3 years ago · 3 comments

hello, I used 8 V100 GPUS to train lgpma on pubtabnet datasets. however training is so slow, almost need 8 days. Is any suggestion？

cpu useage

Answer 1 · 2022-05-19T14:45:55.000Z

Training current model on pubtabnet indeed costs 8 days for 12 epochs (as shown in the log we provided). It is mainly because too many samples in PubTabNet. If you want to train the model on other datasets, you could directly load the trained model and finetune it for few epochs.

If you want to have a faster speed on PubTabNet to verify the correctness, you may reduce the model size or memory cost by using small backbone/ anchor numbers/img_scale. (The performance will somehow be affected a little).

Answer 2 · 2022-05-20T02:55:50.000Z

Thank you for your reply! I will try to use small backbone/ anchor numbers/img_scale.

Can you provide resnet50 pretrained model（'path/to/resnet50-19c8e357.pth'）？I have download resnet50 pretrained model from mmclassification. However some variables' name are not correct.

Answer 3 · 2022-08-24T04:00:45.000Z

Hi gulixin0922,
I am training lgpma model but i keep getting error: simple_test() got an unexpected keyword argument 'gt_bboxes'
Could you please share me the config file: lgpma_pub.py and lgpma_base.py which you used to train pubtabnet dataset?
Thank you very much