训练vitdet出错
LinhanXu3928 opened this issue · 4 comments
我按照 https://zhuanlan.zhihu.com/p/528733299 里的教程一步步配置,不同的是我是单机双卡,所以我的命令是CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=29500 tools/train.py configs/detection/vitdet/vitdet_100e.py --work_dir easycv/vitdet --launcher pytorch --fp16,训练报错
Traceback (most recent call last):
File "/home/xlh2/EasyCV2/tools/train.py", line 277, in
main()
File "/home/xlh2/EasyCV2/tools/train.py", line 266, in main
train_model(
File "/home/xlh2/EasyCV2/easycv/apis/train.py", line 269, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/xlh2/.conda/envs/easycv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/xlh2/EasyCV2/easycv/runner/ev_runner.py", line 105, in train
self.run_iter(data_batch, train_mode=True)
File "/home/xlh2/EasyCV2/easycv/runner/ev_runner.py", line 72, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/xlh2/.conda/envs/easycv/lib/python3.9/site-packages/mmcv/parallel/distributed.py", line 48, in train_step
self._sync_params()
File "/home/xlh2/.conda/envs/easycv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1185, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'MMDistributedDataParallel' object has no attribute '_sync_params'
感觉是你mmcv版本的原因,你的mmcv对应版本的MMdistributeddataparallel没有_sync_params参数
使用mmcv 1.4.4版本试试
MMDistributedDataParallel
grep MMDistributedDataParallel from MMCV_INSTALLATION_DIR, and check whether MMDistributedDataParallel has the attr _sync_params