Helsinki-NLP/Tatoeba-Challenge

About MarianTokenizer

Closed this issue · 0 comments

Im sorry i have to create an issue bc I found nowhere the specific model information of the tokenizer as well as the emails to contact from this repo... So what exactly the model type used in the sentencepiece? unigram as default (according to sentencepiece repo) or bpe?
tks.