Support for continuous action spaces

Question

Support for continuous action spaces

oribarel opened this issue 6 years ago · 5 comments

Hi Lucas,

In the readme you note that your implementation supports continuous action space, but after reading your code I haven't encountered any special handling of a continuous action space. Does your code currently supports continuous action spaces? If not, can you please add this important feature?

Thanks,
Ori

Answer 1 · 2018-11-13T18:49:18.000Z

Hi,

Thank you for your issue.
What is exactly your action space (the dimensions, etc...)?

And to use continuous action, you just need to change the distribution that is returned (https://github.com/lcswillems/torch-rl/blob/daead93401c102c22a289ee79dcca244b619a780/model.py#L101).

Thanks,
Lucas

Answer 2 · 2018-11-15T19:53:11.000Z

Thanks for your response! I have a continuous action space that is a vector of about 50 entries with different bounded ranges. For continuous actions most implementations use sampling from a Gassuian that its mean and variance are estimated using the network, so you have to add much more code. Additionally, in a bounded action space you have to clip the value sampled to the required bounded range.

Answer 3 · 2018-11-16T00:09:45.000Z

Okay, I see, so in your case, you just need to replace the line I pointed out by:

dist = Normal(loc=x1, scale=x2)

where x1 and x2 could be the output of linear layers (see this Pytorch page). Then you can do dist.sample() and clip the values with this.

I haven't used continuous action spaces yet so I haven't a precise use case in mind. But, if you succeed to modify my code to make it working for your use case and if you describe me what you modified, I will modify it.

However, I can't produce a code that works for any kind of environment, observation space, action space (bounded, or not), etc... But I try to make it easy modify for everybody's use case.

Don't hesitate if you need help for your case.

Answer 4 · 2018-11-16T09:12:20.000Z

Thanks! I'll keep you updated.

Answer 5 · 2018-11-26T12:32:11.000Z

I close it for the moment. Don't hesitate to open it again if you have news.