Anjum48/rl-examples

why 128 for discrete, 8192 for continuous?

Closed this issue · 4 comments

why 128 for discrete, 8192 for continuous?

You need larger batch sizes for continuous because the action space has a larger variance

Thank you for your reply and your wonderful repo work. I really appreciate that.
sorry for the nagging but I am a beginner to RL. and I am applying this algorithm to hopper robot with CONTINOUS action and state-space on Gazebo. but the results after a long train looks like this. the robot changes its pose quickly [fast enough I couldn't capture on this gif]. The modifications I made are:
1, I deleted the convolution layers and just used the MLP layers only on both networks.
2, change the env step and reset functions with my own.
3, changed Modal in to Mean
4, deleted the if conditions for discrete action spaces
5, ENTROPY_BETA = 0.0 (as the # 0.01 for discrete, 0.0 for continuous comment), if this is 0.0, how can we handle the premature convergence to suboptimal policy?

I don't know why? but I didn't find PPO algorithms implemented for Gazebo and ROS.
Any recommendations or ideas?
Thank You.
ezgif-2-eae89c9f4d09

Sorry, I'm not sure why. I'm not familiar with the environment, but each environment usually needs its own unique set of hyperparameters (batch size, entropy, etc.). I'm in the middle of refactoring the code to make it easier to test & log different parameters (initially in PyTorch but I might come back to TF one day) so check back soon

okay thank you.