How to write custom Wordpiece class?
xinyinan9527 opened this issue · 0 comments
xinyinan9527 commented
My aim is get the rwkv5 model‘s "tokenizer.json",but it implemented through slow tokenizer(class Pretrainedtokenizer).
I want to convert "slow tokenizer" to "fast tokenizer",it needs to use "tokenizer = Tokenizer(Wordpiece())",but rwkv5 has it‘s own Wordpiece file.
So I want to create a custom Wordpiece
the code is here
from tokenizers.models import Model
class MyWordpiece(Model):
def __init__(self,vocab,unk_token):
self.vocab = vocab
self.unk_token = unk_token
test = MyWordpiece('./vocab.txt',"<s>")
Traceback (most recent call last):
File "test.py", line 78, in <module>
test = MyWordpiece('./vocab.txt',"<s>")
TypeError: Model.__new__() takes 0 positional arguments but 2 were given