print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation?
Opened this issue · 0 comments
Hello!
Thank you for the excellent library. I may have found a bug in how frame is tracked across training, and it comes from the implementation of where the frame = self.frame // self.num_agents
update is inserted, which differs across both ContinuousA2CBase.train()
and DiscreteA2CBase.train
In ContinuousA2CBase.train()
, the update is inserted before self.frame += curr_frames
, which I believe is the wrong implementation. Whereas in DiscreteA2CBase.train()
, the update is inserted after self.frame += curr_frames
, which I believe is the correct implementation.
After one interation of PPO training using num_envs=512
and horizon_length=16
, ContinuousA2CBase.train()
prints outs:
fps step: 6744 fps step and policy inference: 6571 fps total: 6360 epoch: 1/500 frames: 0
After modifying the update to be more similar to DiscreteA2CBase.train()
, the print out is:
fps step: 6744 fps step and policy inference: 6571 fps total: 6360 epoch: 1/500 frames: 8192