bigscience-workshop/t-zero

Question about the number of multi-task learning T0?

StevenTang1998 opened this issue · 2 comments

Hi, I also want to conduct multi-task learning on T5 using serveral datasets, and I notice you use a batch size of 1024 to fine-tune T5. So I want to ask how many steps did you fine-tune it during multi-task learning?

Hi @StevenTang1998 ,
All the results we reported are for checkpoints fine-tuned for 12'000 steps from the T5+LM checkpoints.
We initially fine-tuned for 25'000 steps and performed checkpoint selection based on the training sets.
12'000 was the value we got for T0, and we straightforwardly transposed this value to other models.

Hi, @VictorSanh , thanks for your answering!