Explore Improvements to DT Training Procedure

Question

Explore Improvements to DT Training Procedure

jbloomAus opened this issue 2 years ago · 6 comments

Answer 1 · 2023-05-01T06:02:54.000Z

LN and Adam done. No clear benefit on smaller model. I think I'll get everything implemented then set off some sweeps tomorrow/the next day with the memory env environment.

Answer 2 · 2023-05-02T04:26:41.000Z

done lr scheduling stuff: https://github.com/users/jbloomAus/projects/1/views/1?pane=issue&itemId=27012682

Answer 3 · 2023-05-02T04:28:26.000Z

I'm going to add a task here for setting up wandb sweeps. I think given the stuff I've added, it's important to just get a better sense of the right hyperparameters I need.

Answer 4 · 2023-05-03T04:09:17.000Z

I just had a lightbulb moment relating to #61 so I'm going to do that really quick before I attempt wandb sweeps.

Answer 5 · 2023-05-10T02:11:03.000Z

converting "Implement masking rather than just having different tokens during padding" to it's own card.

Answer 6 · 2023-05-10T02:11:25.000Z

Closing this. Got working agents!