[Questions] Problems when trying to reproduce the same results in the paper
Closed this issue · 1 comments
Hi,@huy-ha
Thank you for sharing such excellent work.
I'm currently working on replicating the results from Table 2 in your paper. I've collected 50,000 trajectories with retry on Drawer task, and used the successful ones to train the distill policy for 10 epochs. I trained the policy with the batch size of 256 on 4 GPUs by removing the evaluation callback following here, and all the other settings and hyperparameters are remained to default. After training I evaluate all the saved checkpoints for 200 episodes.
However, I've only achieved a maximum success rate of 22%, which is quite lower than the reported 55.8%. And here is the config file of my training config.zip. Could you offer any guidance on achieving the results mentioned in the paper?
Thanks in advance!
Hi,@huy-ha
Thank you for sharing such excellent work. I'm currently working on replicating the results from Table 2 in your paper. I've collected 50,000 trajectories with retry on Drawer task, and used the successful ones to train the distill policy for 10 epochs. I trained the policy with the batch size of 256 on 4 GPUs by removing the evaluation callback following here, and all the other settings and hyperparameters are remained to default. After training I evaluate all the saved checkpoints for 200 episodes. However, I've only achieved a maximum success rate of 22%, which is quite lower than the reported 55.8%. And here is the config file of my training config.zip. Could you offer any guidance on achieving the results mentioned in the paper?
Thanks in advance!
I have encountered the same problem as you. Can we communicate about it.