simoninithomas/Deep_reinforcement_learning_Course

Continuous output space scenario

Thebaide opened this issue · 2 comments

Hi,

About the Cartpole, from what I understand, the agent selects an action based on the action probability outputted by the neural network.

But let's imagine that the action space is infinite.
For example, instead of outputting left or right, the agent outputs a speed, which can be a negative value if rolling to the left, or a positive value if rolling to the right.
How can I implement such a system ? does it seem feasible ? What would I need to modify in the code ?

Hi,
Sorry for the delayed response,
It's simple, in continuous action space, you use policy based methods or actor critic methods.
Because DQN output a Q value for each action.
On the other hand Policy gradients output a probability distribution over actions and thus is perfect for continuous actions space. So yes it is feasible and you can check Denny Britz repository.

Ok thank you for the link.