THUDM/icetk

what‘s the meaning of token 20005?

xu-song opened this issue · 0 comments

tokens = icetk.encode('你好世界!这里是 icetk。')
for token in tokens:
    print(token, icetk.text_tokenizer.proto.pieces[token - 20000].piece)
20005 ▁
94874 你好
84097 世界
20035 !
94947 这里是
22881 ▁ice
35955 tk
83823 。

what is "▁" used for?