qgallouedec/panda-gym

Can the SAC algorithm without using HER converge in some of the environments in this project?

Opened this issue · 0 comments

Thank you so much for your outstanding work! I am currently trying to experiment with the SAC algorithm without using HER, but I found that it is very difficult to achieve convergence even in the simplest PandaReach environment. I would like to ask if it is possible to train a good policy using SAC without HER? If so, I would be very interested to know in which environments the SAC algorithm might be effective without HER and what the corresponding hyperparameters might be. I would greatly appreciate it if you could provide some guidance!