Special Tokens not Provided

Question

Special Tokens not Provided

mediadepp opened this issue 2 years ago · 2 comments

Hi, I guess the following code in the provided notebook in Github has a problem.

tokenizer.train(files=paths, vocab_size=52_000, min_frequency=2, special_tokens=[
    "<s>",
    "<pad>",
    "</s>",
    "<unk>",
    "<mask>",
]

The tokens are not provided. You can find it under chapter four, section three.

Answer 1 · 2023-06-17T18:53:24.000Z

I believe this was solved in the April 2023 update of the notebook, by pip installing --upgrade accelerate, as well as pip installing the latest transformers module.

Answer 2 · 2023-06-18T06:08:23.000Z

Yes. Thank you. Hugging Face now requires an accelerator, the notebook was updated in April 2023. Thank you for explaining this.