hikopensource/DAVAR-Lab-OCR

lgpma train slow

Opened this issue · 3 comments

hello, I used 8 V100 GPUS to train lgpma on pubtabnet datasets. however training is so slow, almost need 8 days. Is any suggestion?

屏幕快照 2022-05-19 下午5 10 24

cpu useage
屏幕快照 2022-05-19 下午5 11 32

Training current model on pubtabnet indeed costs 8 days for 12 epochs (as shown in the log we provided). It is mainly because too many samples in PubTabNet. If you want to train the model on other datasets, you could directly load the trained model and finetune it for few epochs.

If you want to have a faster speed on PubTabNet to verify the correctness, you may reduce the model size or memory cost by using small backbone/ anchor numbers/img_scale. (The performance will somehow be affected a little).

Thank you for your reply! I will try to use small backbone/ anchor numbers/img_scale.

Can you provide resnet50 pretrained model('path/to/resnet50-19c8e357.pth')?I have download resnet50 pretrained model from mmclassification. However some variables' name are not correct.

Hi gulixin0922,
I am training lgpma model but i keep getting error: simple_test() got an unexpected keyword argument 'gt_bboxes'
Could you please share me the config file: lgpma_pub.py and lgpma_base.py which you used to train pubtabnet dataset?
Thank you very much