kipgparker/soft-prompt-tuning

Some question about the "LM Adaptation"

qcwthu opened this issue · 1 comments

Hello!

Sorry to bother you. After reading this great work, I have a question about the "LM Adaptation" setting in the paper. In my opinion, this adaptation is used for decoder-based model architecture. How can we use it for encoder-decoder-based model? And do you use the same max sequence length 512 and batch size 128 as the original T5 paper? In addition, as the sentence length in C4 is usually less than the max sequence length, do you combine several different sentences to form a longer sentence whose length is the max sequence length, then divide it into input and target?

Hope that you can give me some advice. Thanks in advance for any help you can give.

Hi!

Thanks for the interest! Doing an LM objective via for encoder-decoder models is a lot like the prefix-LM objective you can do with a decoder-only model. Part of the input if feed into the encoder and the decoder completes the sequence.

We used the same max sequence lengths as in T5. The T5 code for this is open source and you can fine the SeqIO task definition we used here.

The checkpoints we trained with the LM adaptation are available here (or here in the original MeshTensorflow implementation)