nekoumei/DocumentClassificationUsingBERT-Japanese

Model name issue

Opened this issue · 0 comments

I tried to run the Notebook, then I had the following issue:
Maybe pretrained model name was different.
Please examine this issue and fix it.


OSError Traceback (most recent call last)
in ()
----> 1 model = clf.DocumentClassifier(num_labels=9, num_epochs=100)
2 print(model)
3 model.fit(train_df, val_df, early_stopping_rounds=10)
4 y_proba = model.predict(val_df)

1 frames
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1589 ", ".join(s3_models),
1590 pretrained_model_name_or_path,
-> 1591 list(cls.vocab_files_names.values()),
1592 )
1593 )

OSError: Model name 'bert-base-japanese-whole-word-masking' was not found in tokenizers model name list (cl-tohoku/bert-base-japanese, cl-tohoku/bert-base-japanese-whole-word-masking, cl-tohoku/bert-base-japanese-char, cl-tohoku/bert-base-japanese-char-whole-word-masking). We assumed 'bert-base-japanese-whole-word-masking' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.