two problems （HAC and AntPush)

Question

two problems （HAC and AntPush)

Hotwaterman opened this issue 4 years ago · 3 comments

Hotwaterman commented 4 years ago

I used h-baselines to reproduce HIRO and HAC. But there are two problems:

HAC performance is poor. This is somewhat different from the performance in the HAC paper. Is it the reason for the code or something?
When I was doing “AntPush” experiment, the command was like "python experiments/run_hrl.py "AntPush" --use_huber --evaluate --eval_interval 50000 --nb_eval_episodes 50 --total_steps 3000000 --relative_goals --off_policy_corrections" . Are these settings correct? Because I run like this, HIRO's success rate has always been 0.

Answer 1 · 2022-04-28T02:27:15.000Z

I have a similar problem, were you able to get any success with HIRO running on AntPush?

Answer 2 · 2022-04-28T02:49:20.000Z

I succeeded, but the variance between experiments is large, meaning that only 3/4 out of ten random seed experiments may be successful. I think it is a problem of reward setting. The current distance reward will mislead the agent to explore, and the fault tolerance rate of ANTPUSH is very low. HAC is because high-level action hindsight affects exploration.

Answer 3 · 2022-04-28T02:54:23.000Z

I succeeded, but the variance between experiments is large, meaning that only 3/4 out of ten random seed experiments may be successful. I think it is a problem of reward setting. The current distance reward will mislead the agent to explore, and the fault tolerance rate of ANTPUSH is very low. HAC is because high-level action hindsight affects exploration.

Thank you for your quick response. Did you use the same parameters mentioned in the experiments README?