Apparent bug in Atari PPO preset

Question

Apparent bug in Atari PPO preset

Closed this issue 4 months ago · 8 comments

When I run the following code:

from all.environments import AtariEnvironment
from all.experiments import SingleEnvExperiment
from all.presets import atari
env = AtariEnvironment("Pong",device="cuda")
preset = atari.ppo.env(env).device("cuda").hyperparameters().build()
experiment = SingleEnvExperiment(preset, env)
experiment.train(frames=2e6)

I get the following error:

Traceback (most recent call last):
  File "independent_atari.py", line 7, in <module>
    experiment.train(frames=2e6)
  File "/home/ben/class_projs/autonomous-learning-library/all/experiments/single_env_experiment.py", line 46, in train
    self._run_training_episode()
  File "/home/ben/class_projs/autonomous-learning-library/all/experiments/single_env_experiment.py", line 73, in _run_training_episode
    action = self._agent.act(state)
  File "/home/ben/class_projs/autonomous-learning-library/all/bodies/_body.py", line 24, in act
    return self.process_action(self.agent.act(self.process_state(state)))
  File "/home/ben/class_projs/autonomous-learning-library/all/bodies/_body.py", line 24, in act
    return self.process_action(self.agent.act(self.process_state(state)))
  File "/home/ben/class_projs/autonomous-learning-library/all/bodies/_body.py", line 24, in act
    return self.process_action(self.agent.act(self.process_state(state)))
  [Previous line repeated 1 more time]
  File "/home/ben/class_projs/autonomous-learning-library/all/agents/ppo.py", line 71, in act
    self._train(states)
  File "/home/ben/class_projs/autonomous-learning-library/all/agents/ppo.py", line 82, in _train
    states, actions, advantages = self._buffer.advantages(next_states)
  File "/home/ben/class_projs/autonomous-learning-library/all/memory/generalized_advantage.py", line 50, in advantages
    actions = torch.cat(self._actions[:self.n_steps], dim=0)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

It gives the same error on the development and master branch.

Answer 1 · 2021-04-19T19:30:28.000Z

I tried this with A2C (same code, just with a2c) and got the following error:

Traceback (most recent call last):
  File "independent_atari.py", line 7, in <module>
    experiment.train(frames=2e6)
  File "/home/ben/class_projs/autonomous-learning-library/all/experiments/single_env_experiment.py", line 46, in train
    self._run_training_episode()
  File "/home/ben/class_projs/autonomous-learning-library/all/experiments/single_env_experiment.py", line 73, in _run_training_episode
    action = self._agent.act(state)
  File "/home/ben/class_projs/autonomous-learning-library/all/bodies/_body.py", line 24, in act
    return self.process_action(self.agent.act(self.process_state(state)))
  File "/home/ben/class_projs/autonomous-learning-library/all/bodies/_body.py", line 24, in act
    return self.process_action(self.agent.act(self.process_state(state)))
  File "/home/ben/class_projs/autonomous-learning-library/all/bodies/_body.py", line 24, in act
    return self.process_action(self.agent.act(self.process_state(state)))
  [Previous line repeated 1 more time]
  File "/home/ben/class_projs/autonomous-learning-library/all/agents/a2c.py", line 61, in act
    self._train(states)
  File "/home/ben/class_projs/autonomous-learning-library/all/agents/a2c.py", line 69, in _train
    states, actions, advantages = self._buffer.advantages(next_states)
  File "/home/ben/class_projs/autonomous-learning-library/all/memory/advantage.py", line 38, in advantages
    rewards, lengths = self._compute_returns()
  File "/home/ben/class_projs/autonomous-learning-library/all/memory/advantage.py", line 52, in _compute_returns
    device=self._rewards[0].device
AttributeError: 'float' object has no attribute 'device'

Answer 2 · 2021-04-19T19:50:50.000Z

The PPO implementation is a ParallelAgent/ParallelPreset, so it is not compatible with SingleEnvExperiment. Try using a ParallelEnvExperiment and setting ppo.hyperparameters(n_envs=1).

Answer 3 · 2021-04-19T19:53:47.000Z

I don't think this is a bug, but it would probably be useful for the experiment types to enforce the agent type and throw a helpful error message instead of throwing random runtime errors, so I'm classifying this as "style."

Answer 4 · 2021-04-19T19:55:30.000Z

Merged #241 to develop for now. It should allow n_envs=1 to work.

Answer 5 · 2021-04-19T20:51:56.000Z

Ah, I see. Yes, it is very hard to get that from the error message.

Before, when we were trying to use ALL for our primary work with pettingzoo, the # 1 issue we had with the library, the reason that made us turn away from it, was the error messages were just too difficult to understand. It made every little mistake we made take an hour to track down.

Not sure what can be done about that, but explicit type checking would be a good start. For the policies/approximations, shape checking would also be super helpful. I got a ton of weird error messages when I was trying to make custom neural networks, and using incorrect shapes for the input and output layers.

Answer 6 · 2021-04-19T21:05:21.000Z

Only an hour is being extraordinarily generous.

…

On Mon, Apr 19, 2021 at 4:52 PM Benjamin Black ***@***.***> wrote: Ah, I see. Yes, it is very hard to get that from the error message. Before, when we were trying to use ALL for our primary work with pettingzoo, the # 1 issue we had with the library, the reason that made us turn away from it, was the error messages were just too difficult to understand. It made every little mistake we made take an hour to track down. Not sure what can be done about that, but explicit type checking would be a good start. For the policies/approximations, shape checking would also be super helpful. I got a ton of weird error messages when I was trying to make custom neural networks, and using incorrect shapes for the input and output layers. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#244 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEUF33CCOPD624PZKWGSIQDTJSJYBANCNFSM43GQAFNQ> .

-- Thank you for your time, Justin Terry

Answer 7 · 2021-04-19T23:44:14.000Z

So more context for this particular issue, the problem came up with someone wanted to use PPO to train one agent and DQN to use another. This is a very unusual use case that is probably not a good idea, but it brought up the fact that PPO isn't really supported at all for multiagent.

I made a small Preset wrapper and an Agent wrapper to handle this issue.
https://gist.github.com/weepingwillowben/400b42d54b6e57034da1e5293166aa80

Not sure if this should be officially supported or not.

Answer 8 · 2024-02-11T20:13:04.000Z

I think this is fine for single agent now. #288 will handle the multiagent case.