[Error] When training from a checkpoint - TypeError: Can't instantiate abstract class ScalingUpAlgo with abstract method get_policy

Question

[Error] When training from a checkpoint - TypeError: Can't instantiate abstract class ScalingUpAlgo with abstract method get_policy

Closed this issue a year ago · 3 comments

Hi,

When I tried to continue training from a checkpoint
python scalingup/train.py dataset_path=scalingup/wandb/run-20230808_082724-ioa7gmrt/files/ evaluation=bin_transport_test algo=diffusion_default algo.replay_buffer.batch_size=256 tags='[sim,transport,diffusion]' load_from_path=scalingup/wandb/run-20230808_205236-ib7bi2uq/files/checkpoints/last.ckpt, the following error happened:

Error executing job with overrides: ['dataset_path=scalingup/wandb/run-20230808_082724-ioa7gmrt/files/', 'evaluation=bin_transport_test', 'algo=diffusion_default', 'algo.replay_buffer.batch_size=256', 'tags=[sim,transport,diffusion]', 'load_from_path=scalingup/wandb/run-20230808_205236-ib7bi2uq/files/checkpoints/last.ckpt']
Traceback (most recent call last):
File "/home/yan/Documents/scalingup/scalingup/train.py", line 99, in train
trainer, algo = setup_trainer(
File "/home/yan/Documents/scalingup/scalingup/train.py", line 46, in setup_trainer
algo = ScalingUpAlgo.load_from_checkpoint(
File "/home/yan/mambaforge/envs/scalingup/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1531, in load_from_checkpoint
loaded = _load_from_checkpoint(
File "/home/yan/mambaforge/envs/scalingup/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 90, in _load_from_checkpoint
return _load_state(cls, checkpoint, strict=strict, kwargs)
File "/home/yan/mambaforge/envs/scalingup/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 136, in _load_state
obj = cls(_cls_kwargs)
TypeError: Can't instantiate abstract class ScalingUpAlgo with abstract method get_policy

Is it a bug that can be fixed?

Best regards

Answer 1 · 2023-08-21T19:19:06.000Z

Hi,

I am facing the same issue. Can you please advice how you solve this error? Many thanks!

Answer 2 · 2023-09-02T16:45:53.000Z

I've never continued training from a checkpoint, and only used load_from_path to run evaluation. However, the fix would be to change to ScalingUpAlgo.load_from_checkpoint (this line) to DiffusionScalingUpAlgo.load_from_checkpoint.

Answer 3 · 2023-09-04T00:48:00.000Z

I've never continued training from a checkpoint, and only used load_from_path to run evaluation. However, the fix would be to change to ScalingUpAlgo.load_from_checkpoint (this line) to DiffusionScalingUpAlgo.load_from_checkpoint.

The fix works!
Thank you very much.