Reproducing the results of the paper

Question

Reproducing the results of the paper

Closed this issue 2 years ago · 6 comments

Hi,

I tried reproducing the results of the paper on HalfCheetah-v2. I ran the following commands.

python3 scripts/train_agent.py "./runs/halfcheetah_checkpoints" SB3_ON HalfCheetah-v2 cuda '{"ALGO": "PPO"}' --save_freq=10000
python3 scripts/generate_plane_jobs.py --grid-size=31 --magnitude=1.0 --num-steps=200000 "runs/halfcheetah_checkpoints/0200000" "runs/halfcheetah_surface"
python3 scripts/run_jobs_multiproc.py --num-cpus=14 "runs/halfcheetah_surface/jobs.sh"
python3 scripts/job_results_to_csv.py "runs/halfcheetah_surface"
python3 scripts/plot_plane.py "runs/halfcheetah_surface/results.csv" --outname="runs/halfcheetah" --env_name="HalfCheetah-v2"

These are the commands from the README except that I loaded the checkpoint at 200000 steps, which was also done in the paper according to table 2. I ran the experiment two times and found that the results of both runs look quite different from the results reported in figure 11.

So I was wondering whether there are any hyperparameters that I missed or whether I need to do anything else different to obtain results similar to those in the paper.

Answer 1 · 2022-08-15T19:57:36.000Z

Hi, I believe our results look different because we used the trained checkpoint after 1 million timesteps, not 200,000. I assume the agents you show here have not converged to the same policy that we found in our paper yet. Table 2 in our paper refers to the number of evaluation steps used to evaluate each point in the plot. The correct number of training steps for each environment can be found in the hyperparameters folder. If you don't change any settings, it will use the correct number (according to RL Zoo) by default.

Answer 2 · 2022-08-16T07:36:51.000Z

Ok, then I misinterpreted Table 2. Thanks for clarifying.
Just to be sure: If I don't change any settings, the algorithm chooses the best performing checkpoint (runs/halfcheetah_checkpoints/best). Is this how the plots in the paper were generated?

Answer 3 · 2022-08-16T19:53:22.000Z

Yes that is correct. We plotted the best checkpoint for each environment after training with the hyperparameters found in the hyperparameters folder. The settings in Table 2 are for the generate_plane_jobs.py script

Answer 4 · 2022-08-17T05:41:22.000Z

I ran the code for runs/halfcheetah_checkpoints/best but the results still look different from those of the paper. Below are the results of two runs in this configuration.

Answer 5 · 2022-08-17T16:53:26.000Z

I believe those are almost correct, I just chose to plot these in a linear scale instead of the logarithmic scale (which is the default for reward scales this large) in the paper. You can disable the logarithmic scale by passing the flag --logscale off to the plot_plane.py script.

Answer 6 · 2022-08-18T06:08:06.000Z

Ah, I missed that. With the --logscale off argument the plots look similar to those in the paper.

Thanks for your help!