Support for loading a custom vocab.txt?

Question

Support for loading a custom vocab.txt?

Closed this issue 2 years ago · 3 comments

I'm interested in being able to Tokenize text using a custom loaded vocab.txt file (ala hugging face).

Is this possible with the current tokenizers? -- If not, is it something you would consider adding?

Answer 1 · 2022-09-09T12:26:56.000Z

It is a nice idea.
I will add classes BertCasedCustom and BertUncasedCustom, which will in an essence expose CasedTokenizer and UncasedTokenizer respectively.

Answer 2 · 2022-09-12T15:46:31.000Z

Awesome! 👍🏻

Answer 3 · 2022-09-13T09:48:24.000Z

Two new classes are available in the new version. Check it out and let me know if this is working well.
I will close this issue and we can open a new one if the problems arise.
Thanks for the suggestion once again!