Can't pickle lambda object
Mewral opened this issue · 12 comments
Hi, i'm having a problem while training fb15k237 using default parameters. Any suggestions? thanks
Can't pickle local object 'get_linear_schedule_with_warmup..lr_lambda'
File "/Relphormer/main.py", line 126, in main
trainer.fit(lit_model, datamodule=data)
File "/Relphormer/main.py", line 139, in
main()
AttributeError: Can't pickle local object 'get_linear_schedule_with_warmup..lr_lambda'
Hi, can you list the full bug information (more detailedly)?
Thanks for your reply, Here is the full traceback.
Traceback (most recent call last):
File "main.py", line 139, in
main()
File "main.py", line 126, in main
trainer.fit(lit_model, datamodule=data)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self._run(model)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
self.dispatch()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 122, in start_training
mp.spawn(self.new_process, **self.mp_spawn_kwargs)
File "/home/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 179, in start_processes
process.start()
File "/home/miniconda3/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/miniconda3/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/home/miniconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/home/miniconda3/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/home/miniconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/home/miniconda3/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_linear_schedule_with_warmup..lr_lambda'
It might be caused by the multiprocessing in pytorch_lightning package. You can run the python file by a single gpu and check the version of the pytorch_lightning.
你好,之前的问题通过设置set_start_method解决了,但是在训练时会提示"list object has no attribute to",debug发现是data转换成features后features.pos是一个python数组而不是一个tensor所以在to(device)时报错了,环境是一致的,数据使用的是fb15k237,想请问下这个问题是什么原因呢。
下面是完整的traceback
'list' object has no attribute 'to'
File "/home/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 738, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 738, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/miniconda3/lib/python3.8/site-packages/transformers/file_utils.py", line 1639, in wrapper
return func(*args, **kwargs)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 158, in batch_to
return data.to(device, **kwargs)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 84, in apply_to_collection
return function(data, *args, **kwargs)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 161, in move_data_to_device
return apply_to_collection(batch, dtype=dtype, function=batch_to)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/core/hooks.py", line 704, in transfer_batch_to_device
return move_data_to_device(batch, device)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 216, in _apply_batch_transfer_handler
batch = self.transfer_batch_to_device(batch, device)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in batch_to_device
return model._apply_batch_transfer_handler(batch, device)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 394, in to_device
return self.batch_to_device(batch, self.root_device)
File "/home/mazewei/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu.py", line 69, in to_device
batch = super().to_device(batch)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 221, in validation_step
batch = self.to_device(args[0])
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
output = self.trainer.accelerator.validation_step(args)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1107, in run_sanity_check
self.run_evaluation()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 842, in run_train
self.run_sanity_check(self.lightning_module)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
return self.run_train()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
self.dispatch()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self._run(model)
File "/home/github_projects/Relphormer/main.py", line 128, in main
trainer.fit(lit_model, datamodule=data)
File "/home/github_projects/Relphormer/main.py", line 142, in
main()
File "/home/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
AttributeError: 'list' object has no attribute 'to'
您好,有可能是pytorch_lightning导致的。我们正在复现这个问题,请问你有使用多卡训练吗?
没有使用多卡,我的参数是:
["--gpus", "2,", "--max_epochs", "16", "--num_workers", "32", "--model_name_or_path", "bert-base-uncased",
"--accumulate_grad_batches", "1", "--model_class", "BertKGC", "--batch_size", "128", "--checkpoint",
"/home/mazewei/github_projects/Relphormer/pretrain/output/FB15k-237/epoch=15-step=19299-Eval/hits10=0.96.ckpt",
"--pretrain", "0", "--bce", "0", "--check_val_every_n_epoch", "1", "--overwrite_cache", "--data_dir", "dataset/FB15k-237",
"--eval_batch_size", "256", "--max_seq_length", "128", "--lr", "3e-5", "--max_triplet", "64", "--add_attn_bias", "True", "--use_global_node", "True"]
好的 请问您目前的transformers和pytorch_lightning的版本是多少
transformers==4.7.0 pytorch_lightning==1.3.1
@bizhen46766 你好,想请问下这个问题有什么进展吗
你好!我们正在解决,复现修正后的代码这两天会更新到项目库上
您好该问题已解决,您可以重新pull代码。