princeton-nlp/DataMUX

How long to pretrain for the retrieval task?

bangawayoo opened this issue · 1 comments

Hello authors,
thank you for the very interesting work!

I was trying to do something similar (i.e. feeding in multiple inputs to the model), but I had some trouble training it. The pertaining warmup seems to solve the issue here, so I would like to take a look.
I am having some trouble finding the training configuration for the pre-training step.

How many epochs is enough for the pre-training phase? Is it 3, which is the default in transformers.TrainerArguments ?

The number of steps for retrieval depends on the number of instances(N). We need more iterations for higher N. I would suggest training for around 20K iteration for N < 20.