problem with VocabAugmentor
janekzimoch opened this issue · 3 comments
There seems to be a problem with the VocabAugumentor class.
When I run:
new_tokens = augmentor.get_new_tokens(ft_corpus_train)
I get this error:
TypeError: Can’t convert <tokenizers.trainers.PreTrainedTokenizerFast object at 0x7f8641325570> to Sequence
Solution: I resolved this problem by switching orders of arguments on line 91 in vocab_augmentor.py. (this was a suggested fix for the above mentioned error, from a quick google search)
From:
self.rust_tokenizer.train(self.trainer, train_files)
to:
self.rust_tokenizer.train(train_files, self.trainer)
After this change program runs as expected.
I didn't look into this problem - as the quick fix solution worked - but you may want to have a look if there are no bugs related to that.
I had a similar issue, it fixed my problem :)
My issue was: TypeError: Can't convert <tokenizers.trainers.WordPieceTrainer object at 0x17ceef250> to Sequence
You will need to download class VocabAugmentor(BaseEstimator): from SBERT website and change
From:
self.rust_tokenizer.train(self.trainer, train_files)
to:
self.rust_tokenizer.train(train_files, self.trainer)