ModelTC/MQBench

关于DP 和 DDP 训练时报错 “symbolically traced variables cannot be used as inputs to control flow”

shinianzhihou opened this issue · 2 comments

作者你好,我在使用MQBench时候进行单卡训练时,一切都是work的(感谢给的十分清楚的示例)。

但是在我使用DP和DDP进行训练的时候,报错如下,在我翻了torch的源码后发现,DP进行forward时候会根据输入和超参数进行一些操作(inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)),为什么这个可以通过你们的测试样例呢?请问有什么解决方案,或者这是由什么引起的,可以给一点建议进行讨论吗?万分感谢!

  File "train.py", line 734, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 364, in train
    model = prepare_by_platform(model, choose_backend(opt))
  File "/root/anaconda3/lib/python3.7/site-packages/MQBench-0.0.6-py3.7.egg/mqbench/prepare_by_platform.py", line 389, in prepare_by_platform
    graph = tracer.trace(model, concrete_args)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/fx/_symbolic_trace.py", line 615, in trace
    self.create_node('output', 'output', (self.create_arg(fn(*args)),), {},
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 158, in forward
    inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 175, in scatter
    return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 44, in scatter_kwargs
    inputs = scatter(inputs, target_gpus, dim) if inputs else []
  File "/root/anaconda3/lib/python3.7/site-packages/torch/fx/proxy.py", line 251, in __bool__
    return self.tracer.to_bool(self)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/fx/proxy.py", line 152, in to_bool
    raise TraceError('symbolically traced variables cannot be used as inputs to control flow')
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow

请问这行代码具体位置是在哪里呢?
或许可以先prepare_by_platform后进行parallel包装

This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!