ohmeow/blurr

Error due to tok_kwargs setting for Hindi Language

Opened this issue · 0 comments

I followed below article to finetune mbart model for Hindi language summarization https://ohmeow.github.io/blurr/text.modeling.seq2seq.summarization.html

For which i changed the language parameter "en_XX" to "hi_IN"in the following code.

if hf_arch == "mbart":
text_gen_kwargs["decoder_start_token_id"] = hf_tokenizer.get_vocab()["hi_IN"]

tok_kwargs = {}
if hf_arch == "mbart":
tok_kwargs["src_lang"], tok_kwargs["tgt_lang"] = "hi_IN", "hi_IN"

But i am getting the following error when run the command: dls = dblock.dataloaders(df_train, bs=2)

TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'src_lang'

I am beginner.pl suggest solution of the above problem.