General question about padding in the setting of soft-prompt tuning
krishnakanthnakkav2 opened this issue · 2 comments
Hello Authors,
I have a general question about padding in soft-prompt tuning setting.
In a batch when sequences are different length, typically we left-pad the smaller sequences like
# first example is left padded, batch size is set to two.
input_tokens = [
[<pad>, <pad>, "my" "name"],
["where", "are", "you", "from"]
]
And then, do you add soft-prompt embedding to the embeddings of the above padded tokens like below?
# se1, se2 are soft-prompt embeddings that needs to be tuned
input_embedding = [
[ se1, se2, Embedding(<pad>), Embedding(<pad>), Embedding("my"), Embedding("name"), ],
[ se1, se2, Embedding("where"), Embedding("are"), Embedding("you"), Embedding("from"), ],
]
Is my understanding right? Section 2.1 in the paper mentions the X to be padded using max sequence length.
Also other concern I have is, shouldnt we prepend the soft-prompt embeddings to the unpaded user embeddings and then pad the smaller sequences? Like,
# please note the change in the first example
input_embedding = [
[ Embedding(<pad>), Embedding(<pad>), se1, se2, , Embedding("my"), Embedding("name"), ],
[ se1, se2, Embedding("where"), Embedding("are"), Embedding("you"), Embedding("from"), ],
]
Can you please share your insights on how padding is done while doing soft-prompt tuning in general? Thank you for clarification.
Hi,
Thanks for your question. Yes, you are correct. We add the soft prompt after the padding. For the T5 model, the padding is on the right side. For the decoder model, we do the same as what you mentioned above.
Regarding whether we should prepend the soft-prompt embeddings to the unpaded user embeddings and then pad the smaller sequences, it is very interesting to try and it might be helpful for improving the model performance. However, it might make it difficult to apply our proposed method, where we need to add the multiplication of the low-rank matrices to the word representations.
Thank you!