yandex/YaLM-100B

[NL] token

TatianaShavrina opened this issue · 2 comments

What's the [NL] token appearing in generation?
Is it an artifact or a special token?

It's newline. You can replace it with \n.

Yes, it's the newline token in our tokenizer. To make it clearer, we have just added (deb045d) mapping it back to \n after detokenization (tokenization already took it into account). Thank you for noticing!