real-stanford/scalingup

[Questions] Problems when trying to reproduce the same results in the paper

Closed this issue · 1 comments

whc688 commented

Hi,@huy-ha

Thank you for sharing such excellent work.
I'm currently working on replicating the results from Table 2 in your paper. I've collected 50,000 trajectories with retry on Drawer task, and used the successful ones to train the distill policy for 10 epochs. I trained the policy with the batch size of 256 on 4 GPUs by removing the evaluation callback following here, and all the other settings and hyperparameters are remained to default. After training I evaluate all the saved checkpoints for 200 episodes.
However, I've only achieved a maximum success rate of 22%, which is quite lower than the reported 55.8%. And here is the config file of my training config.zip. Could you offer any guidance on achieving the results mentioned in the paper?

Thanks in advance!

Hi,@huy-ha

Thank you for sharing such excellent work. I'm currently working on replicating the results from Table 2 in your paper. I've collected 50,000 trajectories with retry on Drawer task, and used the successful ones to train the distill policy for 10 epochs. I trained the policy with the batch size of 256 on 4 GPUs by removing the evaluation callback following here, and all the other settings and hyperparameters are remained to default. After training I evaluate all the saved checkpoints for 200 episodes. However, I've only achieved a maximum success rate of 22%, which is quite lower than the reported 55.8%. And here is the config file of my training config.zip. Could you offer any guidance on achieving the results mentioned in the paper?

Thanks in advance!

I have encountered the same problem as you. Can we communicate about it.