AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
wongyufei opened this issue · 2 comments
(torchdistill) lthpc@lthpc:/data/Code/Wang_Yufei/PAD/torchdistill$ bash run_train.sh
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
2021/06/29 17:16:43 INFO torchdistill.common.main_util | distributed init (rank 0): env://
2021/06/29 17:16:43 INFO torchdistill.common.main_util | distributed init (rank 1): env://
2021/06/29 17:16:43 INFO torchdistill.common.main_util | distributed init (rank 2): env://
2021/06/29 17:16:43 INFO root Added key: store_based_barrier_key:1 to store for rank: 1
2021/06/29 17:16:43 INFO root Added key: store_based_barrier_key:1 to store for rank: 2
2021/06/29 17:16:43 INFO root Added key: store_based_barrier_key:1 to store for rank: 0
2021/06/29 17:16:47 INFO main Namespace(adjust_lr=False, config='configs/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/cse_l2-resnet18_from_resnet34.yaml', device='cuda', dist_url='env://', log='log/ilsvrc2012/cse_l2-resnet18_from_resnet34.log', seed=None, start_epoch=0, student_only=False, sync_bn=False, test_only=False, world_size=3)
2021/06/29 17:16:47 INFO torchdistill.datasets.util Loading train data
2021/06/29 17:16:51 INFO torchdistill.datasets.util dataset_id ilsvrc2012/train
: 4.093475580215454 sec
2021/06/29 17:16:51 INFO torchdistill.datasets.util Loading val data
2021/06/29 17:16:51 INFO torchdistill.datasets.util dataset_id ilsvrc2012/val
: 0.1801161766052246 sec
2021/06/29 17:16:52 INFO torchdistill.common.main_util ckpt file is not found at ./resource/ckpt/ilsvrc2012/teacher/ilsvrc2012-resnet34.pt
2021/06/29 17:16:52 INFO torchdistill.common.main_util Loading model parameters
2021/06/29 17:16:52 INFO main Start training
2021/06/29 17:16:52 INFO torchdistill.models.util [teacher model]
2021/06/29 17:16:52 INFO torchdistill.models.util Using the original teacher model
2021/06/29 17:16:52 INFO torchdistill.models.util [student model]
2021/06/29 17:16:52 INFO torchdistill.models.util Using the original student model
2021/06/29 17:16:52 INFO torchdistill.core.distillation Loss = 1.0 * OrgLoss + 15.0 * MSELoss()
2021/06/29 17:16:52 INFO torchdistill.core.distillation Freezing the whole teacher model
Traceback (most recent call last):
File "examples/image_classification.py", line 180, in
main(argparser.parse_args())
File "examples/image_classification.py", line 162, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "examples/image_classification.py", line 110, in train
device, device_ids, distributed, lr_factor)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 406, in get_distillation_box
device, device_ids, distributed, lr_factor, accelerator)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 229, in init
self.setup(train_config)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 125, in setup
teacher_any_updatable)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/util.py", line 50, in wrap_model
model = DistributedDataParallel(model, device_ids=device_ids, find_unused_parameters=find_unused_parameters)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in init
"DistributedDataParallel is not needed when a module "
AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
Traceback (most recent call last):
File "examples/image_classification.py", line 180, in
main(argparser.parse_args())
File "examples/image_classification.py", line 162, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "examples/image_classification.py", line 110, in train
device, device_ids, distributed, lr_factor)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 406, in get_distillation_box
device, device_ids, distributed, lr_factor, accelerator)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 229, in init
self.setup(train_config)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 125, in setup
teacher_any_updatable)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/util.py", line 50, in wrap_model
model = DistributedDataParallel(model, device_ids=device_ids, find_unused_parameters=find_unused_parameters)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in init
"DistributedDataParallel is not needed when a module "
AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
Traceback (most recent call last):
File "examples/image_classification.py", line 180, in
main(argparser.parse_args())
File "examples/image_classification.py", line 162, in main
train(teacher_model, student_model, dataset_dict, ckpt_file_path, device, device_ids, distributed, config, args)
File "examples/image_classification.py", line 110, in train
device, device_ids, distributed, lr_factor)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 406, in get_distillation_box
device, device_ids, distributed, lr_factor, accelerator)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 229, in init
self.setup(train_config)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/distillation.py", line 125, in setup
teacher_any_updatable)
File "/data/Code/Wang_Yufei/PAD/torchdistill/torchdistill/core/util.py", line 50, in wrap_model
model = DistributedDataParallel(model, device_ids=device_ids, find_unused_parameters=find_unused_parameters)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in init
"DistributedDataParallel is not needed when a module "
AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
Killing subprocess 28872
Killing subprocess 28873
Killing subprocess 28874
Traceback (most recent call last):
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/lthpc/.conda/envs/torchdistill/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/lthpc/.conda/envs/torchdistill/bin/python3', '-u', 'examples/image_classification.py', '--config', 'configs/official/ilsvrc2012/yoshitomo-matsubara/rrpr2020/cse_l2-resnet18_from_resnet34.yaml', '--log', 'log/ilsvrc2012/cse_l2-resnet18_from_resnet34.log', '--world_size', '3']' returned non-zero exit status 1.
(torchdistill) lthpc@lthpc:/data/Code/Wang_Yufei/PAD/torchdistill$
Hi @wongyufei ,
Thank you for reporting the issue.
I think it is caused by some change of DistributedDataParallel
made in recent PyTorch version. When none of the modules in teacher model is trained, this will happen. (e.g., requires_grad: False
in teacher/student entry)
You can avoid it by replacing wrapper: 'DistributedDataParallel'
with wrapper: 'DataParallel'
in teacher
entry of such yaml files.
I just merged #124, which should resolve the issue. Fetch the repo and try it again.