two problems (HAC and AntPush)
Hotwaterman opened this issue · 3 comments
I used h-baselines to reproduce HIRO and HAC. But there are two problems:
- HAC performance is poor. This is somewhat different from the performance in the HAC paper. Is it the reason for the code or something?
- When I was doing “AntPush” experiment, the command was like "python experiments/run_hrl.py "AntPush" --use_huber --evaluate --eval_interval 50000 --nb_eval_episodes 50 --total_steps 3000000 --relative_goals --off_policy_corrections" . Are these settings correct? Because I run like this, HIRO's success rate has always been 0.
I have a similar problem, were you able to get any success with HIRO running on AntPush?
I succeeded, but the variance between experiments is large, meaning that only 3/4 out of ten random seed experiments may be successful. I think it is a problem of reward setting. The current distance reward will mislead the agent to explore, and the fault tolerance rate of ANTPUSH is very low. HAC is because high-level action hindsight affects exploration.
I succeeded, but the variance between experiments is large, meaning that only 3/4 out of ten random seed experiments may be successful. I think it is a problem of reward setting. The current distance reward will mislead the agent to explore, and the fault tolerance rate of ANTPUSH is very low. HAC is because high-level action hindsight affects exploration.
Thank you for your quick response. Did you use the same parameters mentioned in the experiments README?