print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation?

Question

print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation?

Opened this issue 4 months ago · 0 comments

Hello!

Thank you for the excellent library. I may have found a bug in how frame is tracked across training, and it comes from the implementation of where the frame = self.frame // self.num_agents update is inserted, which differs across both ContinuousA2CBase.train() and DiscreteA2CBase.train

In ContinuousA2CBase.train(), the update is inserted before self.frame += curr_frames, which I believe is the wrong implementation. Whereas in DiscreteA2CBase.train(), the update is inserted after self.frame += curr_frames, which I believe is the correct implementation.

After one interation of PPO training using num_envs=512 and horizon_length=16, ContinuousA2CBase.train() prints outs:

fps step: 6744 fps step and policy inference: 6571 fps total: 6360 epoch: 1/500 frames: 0

After modifying the update to be more similar to DiscreteA2CBase.train(), the print out is:

fps step: 6744 fps step and policy inference: 6571 fps total: 6360 epoch: 1/500 frames: 8192