facebookresearch/XLM

Vocab size not match model input size

moment-of-peace opened this issue · 1 comments

Why the vocab and model checkpoint provided in "II. Cross-lingual language model pretraining (XLM)" of readme don' t match? For example, the size of vocab for "tokenize + lowercase + no accent + BPE" should be 95k (the embedding size of the model), but after downloading, the vocab file actually has more than 120k lines

Similar issue here with XLM-R 100 language model vocab file, it should have 200K vocab when the downloaded file has 239776 vocab.