kohjingyu/fromage

How does generate work?

zhaoshitian opened this issue · 2 comments

In the generate method of the model, I notice that you remove some tokens, according to the top_p value. I don't understand why you remove the tokens with high logits, could you give me some material about this? Appreciate it so much!

Hi, we are actually removing the tokens with cumulative probability > top_p. This means that we only keep the top tokens such that their sum is equal to top_p (for example, if we set top_p = 0.9 and the first 10 tokens sum to 0.9, we will discard every token after 10 and sample only from the first 10.

This is standard nucleus sampling. Hope that makes sense!

I understand! Thank you so much!!