[Bug] AdaBelief optimizer crashes checkpoint restore
wwoods opened this issue · 4 comments
Search before asking
- I searched the issues and found no similar issues.
Ray Component
RLlib
Issue Severity
Medium: It contributes to significant difficulty to complete my task but I work arounds and get it resolved.
What happened + What you expected to happen
If using the adabelief_pytorch.AdaBelief
optimizer, its state_dict()
looks like this in the rllib checkpoint:
'_optimizer_variables': [{'state': {},
'param_groups': [{'lr': 0.0001,
'betas': (0.9, 0.999),
'eps': 1e-16,
'weight_decay': 0,
'amsgrad': False,
'buffer': [[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None]],
'params': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}]}]
The issue is the None
entries -- this optimizer expects a normal list, and not tensors. Rllib tries to force it into a tensor and crashes:
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/ray/tune/trainable.py", line 467, in restore
self.load_checkpoint(checkpoint_path)
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 1823, in load_checkpoint
self.__setstate__(extra_data)
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 2443, in __setstate__
self.workers.local_worker().restore(state["worker"])
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1346, in restore
self.policy_map[pid].set_state(state)
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py", line 715, in set_state
optim_state_dict = convert_to_torch_tensor(
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/ray/rllib/utils/torch_utils.py", line 161, in convert_to_torch_tensor
return tree.map_structure(mapping, x)
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/tree/__init__.py", line 510, in map_structure
[func(*args) for args in zip(*map(flatten, structures))])
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/tree/__init__.py", line 510, in <listcomp>
[func(*args) for args in zip(*map(flatten, structures))])
File "/home/waltw/.cache/pypoetry/virtualenvs/tread-hv_zlCMt-py3.9/lib/python3.9/site-packages/ray/rllib/utils/torch_utils.py", line 152, in mapping
tensor = torch.from_numpy(np.asarray(item))
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
Versions / Dependencies
1.10.0
Reproduction script
This does the trick:
from ray.rllib.agents.dqn.dqn import DQNTrainer
dq = DQNTrainer(config={'env': 'Pong-v0', 'framework': 'torch'})
dq.train()
dq.workers.local_worker().policy_map['default_policy']._optimizers[0].param_groups[0]['fake'] = [None]
save_path = dq.save('test_issue')
dq = DQNTrainer(config={'env': 'Pong-v0', 'framework': 'torch'})
dq.restore(save_path)
Anything else
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
May I ask how you work around this @wwoods? Did you find an alternate way of loading in a saved model?
I had a similar issue. And I have found a workaround.
I run DRL experiments using tune.run. When attempting to restore a checkpoint after training, I get this error:
Traceback (most recent call last):
File "/Users/user/Development/CurrentProjects/Project/rllib_cli.py", line 463, in <module>
main()
File "/Users/user/Development/CurrentProjects/Project/rllib_cli.py", line 116, in main
enjoy(False)
File "/Users/user/Development/CurrentProjects/Project/rllib_cli.py", line 290, in enjoy
trainer.restore(checkpoint_path)
File "/Users/user/miniforge3/lib/python3.9/site-packages/ray/tune/trainable.py", line 490, in restore
self.load_checkpoint(checkpoint_path)
File "/Users/user/miniforge3/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 1861, in load_checkpoint
self.__setstate__(extra_data)
File "/Users/user/miniforge3/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 2509, in __setstate__
self.workers.local_worker().restore(state["worker"])
File "/Users/user/miniforge3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1353, in restore
self.policy_map[pid].set_state(state)
File "/Users/user/miniforge3/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py", line 715, in set_state
optim_state_dict = convert_to_torch_tensor(
File "/Users/user/miniforge3/lib/python3.9/site-packages/ray/rllib/utils/torch_utils.py", line 158, in convert_to_torch_tensor
return tree.map_structure(mapping, x)
File "/Users/user/miniforge3/lib/python3.9/site-packages/tree/__init__.py", line 430, in map_structure
[func(*args) for args in zip(*map(flatten, structures))])
File "/Users/user/miniforge3/lib/python3.9/site-packages/tree/__init__.py", line 430, in <listcomp>
[func(*args) for args in zip(*map(flatten, structures))])
File "/Users/user/miniforge3/lib/python3.9/site-packages/ray/rllib/utils/torch_utils.py", line 152, in mapping
tensor = torch.from_numpy(np.asarray(item))
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
Here comes a workaround. Note that it is indeed only a workaround, not a fix. The error is on a level that I do not properly grasp in order to implement a proper solution. Instead of the PPOTrainer, I use a patched version:
class PatchedPPOTrainer(agents.ppo.PPOTrainer):
#@override(Trainable)
def load_checkpoint(self, checkpoint_path: str) -> None:
extra_data = pickle.load(open(checkpoint_path, "rb"))
worker = pickle.loads(extra_data["worker"])
worker = PatchedPPOTrainer.__fix_recursively(worker)
extra_data["worker"] = pickle.dumps(worker)
self.__setstate__(extra_data)
def __fix_recursively(data):
if isinstance(data, dict):
return {key: PatchedPPOTrainer.__fix_recursively(value) for key, value in data.items()}
elif isinstance(data, list):
return [PatchedPPOTrainer.__fix_recursively(value) for value in data]
elif data is None:
return 0
else:
return data
There seems to be a problem with the None values loaded from the checkpoint.
I have some problem with DQN. Any other solution ? Any attempt to solve this bug from Ray ?
I had a similar problem. Check 27262