多卡训练出现RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss.如何解决
iamecong opened this issue · 1 comments
iamecong commented
Prerequisite
- I have searched the existing and past issues but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version.
🐞 Describe the bug
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True
to torch.nn.parallel.DistributedDataParallel
, and by
making sure all forward
function outputs participate in calculating loss.
Environment
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0
Additional information
多卡训练出现错误