why 128 for discrete, 8192 for continuous?

Question

why 128 for discrete, 8192 for continuous?

Closed this issue 4 years ago · 4 comments

Answer 1 · 2020-05-25T14:02:42.000Z

You need larger batch sizes for continuous because the action space has a larger variance

Answer 2 · 2020-05-26T02:04:39.000Z

Thank you for your reply and your wonderful repo work. I really appreciate that.
sorry for the nagging but I am a beginner to RL. and I am applying this algorithm to hopper robot with CONTINOUS action and state-space on Gazebo. but the results after a long train looks like this. the robot changes its pose quickly [fast enough I couldn't capture on this gif]. The modifications I made are:
1, I deleted the convolution layers and just used the MLP layers only on both networks.
2, change the env step and reset functions with my own.
3, changed Modal in to Mean
4, deleted the if conditions for discrete action spaces
5, ENTROPY_BETA = 0.0 (as the # 0.01 for discrete, 0.0 for continuous comment), if this is 0.0, how can we handle the premature convergence to suboptimal policy?

I don't know why? but I didn't find PPO algorithms implemented for Gazebo and ROS.
Any recommendations or ideas?
Thank You.

Answer 3 · 2020-05-29T15:45:16.000Z

Sorry, I'm not sure why. I'm not familiar with the environment, but each environment usually needs its own unique set of hyperparameters (batch size, entropy, etc.). I'm in the middle of refactoring the code to make it easier to test & log different parameters (initially in PyTorch but I might come back to TF one day) so check back soon

Answer 4 · 2020-06-02T13:54:06.000Z

okay thank you.