pranz24/pytorch-soft-actor-critic

Normalized Actions has bugs

Closed this issue · 3 comments

One should be careful in uncommenting the normalized actions wrapper, as one has to make sure to call _reverse_action() and _max_episode_steps has a typo and should not be a function, otherwise the following in main.py would not work: mask = 1 if episode_steps == env._max_episode_steps else float(not done)

This small bug caused a lot of headaches but the repo is super nice otherwise!

True

The easiest way to use normalized actions would be to directly scale the actions by a factor of env.action_space.high[0]
Like it is done in these 2 repo's
https://github.com/sfujim/TD3
https://github.com/openai/spinningup/tree/master/spinup/algos/sac

And yes _max_episode_steps is not part of gym.ActionWrapper (I don't understand why I have used it there)
You can check how _max_episode_steps works:
https://github.com/openai/gym/blob/85a5372a19c0f35db2410e586cc9a32c4d94bf1a/gym/wrappers/time_limit.py
https://github.com/openai/gym/blob/239aaf14ce804c9ce5068bfb69590110ea8ef1be/gym/envs/registration.py

Thanks a lot! :-)