NMZivkovic/BertTokenizers

Support for loading a custom vocab.txt?

Closed this issue · 3 comments

I'm interested in being able to Tokenize text using a custom loaded vocab.txt file (ala hugging face).

Is this possible with the current tokenizers? -- If not, is it something you would consider adding?

It is a nice idea.
I will add classes BertCasedCustom and BertUncasedCustom, which will in an essence expose CasedTokenizer and UncasedTokenizer respectively.

Awesome! 👍🏻

Two new classes are available in the new version. Check it out and let me know if this is working well.
I will close this issue and we can open a new one if the problems arise.
Thanks for the suggestion once again!