[Error] When training from a checkpoint - TypeError: Can't instantiate abstract class ScalingUpAlgo with abstract method get_policy
Closed this issue · 3 comments
Hi,
When I tried to continue training from a checkpoint
python scalingup/train.py dataset_path=scalingup/wandb/run-20230808_082724-ioa7gmrt/files/ evaluation=bin_transport_test algo=diffusion_default algo.replay_buffer.batch_size=256 tags='[sim,transport,diffusion]' load_from_path=scalingup/wandb/run-20230808_205236-ib7bi2uq/files/checkpoints/last.ckpt
, the following error happened:
Error executing job with overrides: ['dataset_path=scalingup/wandb/run-20230808_082724-ioa7gmrt/files/', 'evaluation=bin_transport_test', 'algo=diffusion_default', 'algo.replay_buffer.batch_size=256', 'tags=[sim,transport,diffusion]', 'load_from_path=scalingup/wandb/run-20230808_205236-ib7bi2uq/files/checkpoints/last.ckpt']
Traceback (most recent call last):
File "/home/yan/Documents/scalingup/scalingup/train.py", line 99, in train
trainer, algo = setup_trainer(
File "/home/yan/Documents/scalingup/scalingup/train.py", line 46, in setup_trainer
algo = ScalingUpAlgo.load_from_checkpoint(
File "/home/yan/mambaforge/envs/scalingup/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1531, in load_from_checkpoint
loaded = _load_from_checkpoint(
File "/home/yan/mambaforge/envs/scalingup/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 90, in _load_from_checkpoint
return _load_state(cls, checkpoint, strict=strict, kwargs)
File "/home/yan/mambaforge/envs/scalingup/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 136, in _load_state
obj = cls(_cls_kwargs)
TypeError: Can't instantiate abstract class ScalingUpAlgo with abstract method get_policy
Is it a bug that can be fixed?
Best regards
Hi,
I am facing the same issue. Can you please advice how you solve this error? Many thanks!
I've never continued training from a checkpoint, and only used load_from_path
to run evaluation. However, the fix would be to change to ScalingUpAlgo.load_from_checkpoint
(this line) to DiffusionScalingUpAlgo.load_from_checkpoint
.
I've never continued training from a checkpoint, and only used
load_from_path
to run evaluation. However, the fix would be to change toScalingUpAlgo.load_from_checkpoint
(this line) toDiffusionScalingUpAlgo.load_from_checkpoint
.
The fix works!
Thank you very much.