
[Questions] Problems when trying to reproduce the same results in the paper

Closed this issue · 1 comments

whc688 commented


Thank you for sharing such excellent work.
I'm currently working on replicating the results from Table 2 in your paper. I've collected 50,000 trajectories with retry on Drawer task, and used the successful ones to train the distill policy for 10 epochs. I trained the policy with the batch size of 256 on 4 GPUs by removing the evaluation callback following here, and all the other settings and hyperparameters are remained to default. After training I evaluate all the saved checkpoints for 200 episodes.
However, I've only achieved a maximum success rate of 22%, which is quite lower than the reported 55.8%. And here is the config file of my training Could you offer any guidance on achieving the results mentioned in the paper?

Thanks in advance!


Thank you for sharing such excellent work. I'm currently working on replicating the results from Table 2 in your paper. I've collected 50,000 trajectories with retry on Drawer task, and used the successful ones to train the distill policy for 10 epochs. I trained the policy with the batch size of 256 on 4 GPUs by removing the evaluation callback following here, and all the other settings and hyperparameters are remained to default. After training I evaluate all the saved checkpoints for 200 episodes. However, I've only achieved a maximum success rate of 22%, which is quite lower than the reported 55.8%. And here is the config file of my training Could you offer any guidance on achieving the results mentioned in the paper?

Thanks in advance!

I have encountered the same problem as you. Can we communicate about it.