hikopensource/DAVAR-Lab-OCR

单卡训练如何设置参数

tianyongliu opened this issue · 1 comments

我手头暂时只有一张卡,想请问下如何改动训练参数?
麻烦给个demo,谢谢!

另外,训练OCR模型的数据:
/root/DAVAR-Lab-OCR-main/demo/table_recognition/lgpma/configs/ocr_models/rcg_res32_bilstm_attn_pubtabnet_sensitive.py

ann_file='path/to/PubTabNet/recog/datalist_recog_val.json',
img_prefix='path/to/PubTabNet/recog/',

ann_file='path/to/PubTabNet/recog/datalist_recog_val.json',
img_prefix='path/to/PubTabNet/recog/',

这里的训练数据可以哪里下载或是如何制作?

谢谢!

但是我全部用样例数据训练完,报错:
这是lgpma_pub.py 的部分配置:

base = "./lgpma_base.py"

data = dict(
samples_per_gpu=3,
workers_per_gpu=1,
train=dict(
ann_file='/root/davar_datalist_example/davar_datalist_demo.json',
img_prefix='/root/davar_datalist_example/images'),
# According to the evaluation metric, select the appropriate validation dataset format.
val=dict(
ann_file='/root/davar_datalist_example/davar_datalist_demo.json',
img_prefix='/root/davar_datalist_example/images'),
test=dict(
samples_per_gpu=1,
ann_file='/root/davar_datalist_example/davar_datalist_demo.json',
img_prefix='/root/davar_datalist_example/images')
)

这是打印信息:

2023-03-03 11:15:32,466 - davarocr - INFO - Saving checkpoint at 1 epochs
2023-03-03 11:15:41,018 - davarocr - INFO - Saving checkpoint at 2 epochs
2023-03-03 11:15:49,519 - davarocr - INFO - Saving checkpoint at 3 epochs
2023-03-03 11:15:59,133 - davarocr - INFO - Saving checkpoint at 4 epochs
2023-03-03 11:16:08,227 - davarocr - INFO - Saving checkpoint at 5 epochs
2023-03-03 11:16:17,693 - davarocr - INFO - Saving checkpoint at 6 epochs
2023-03-03 11:16:26,505 - davarocr - INFO - Saving checkpoint at 7 epochs
2023-03-03 11:16:34,652 - davarocr - INFO - Saving checkpoint at 8 epochs
2023-03-03 11:16:43,591 - davarocr - INFO - Saving checkpoint at 9 epochs
2023-03-03 11:16:52,845 - davarocr - INFO - Saving checkpoint at 10 epochs
2023-03-03 11:17:00,441 - davarocr - INFO - Saving checkpoint at 11 epochs
2023-03-03 11:17:09,522 - davarocr - INFO - Saving checkpoint at 12 epochs
Traceback (most recent call last):
File "/root/miniconda/envs/d2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/miniconda/envs/d2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/miniconda/envs/d2/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/root/miniconda/envs/d2/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda/envs/d2/bin/python', '-u', '/root/workspace/DAVAR-Lab-OCR-main/tools/train.py', '--local_rank=0', './configs/lgpma_pub.py', '--no-validate', '--launcher', 'pytorch']' died with <Signals.SIGSEGV: 11>.