yeyupiaoling/Whisper-Finetune

微调在WhisperProcessor.from_pretrained调用时就报错

lichq5 opened this issue · 5 comments

我使用单卡训练,一启动就报错:
Traceback (most recent call last):
File "/workspace/Whisper-Finetune-master/finetune.py", line 47, in
processor = WhisperProcessor.from_pretrained(args.base_model,
File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 228, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 272, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
return cls._from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
init_kwargs[key] = added_tokens_map.get(init_kwargs[key], init_kwargs[key])
TypeError: unhashable type: 'dict'
这个是怎么回事,是哪里搞错了吗?

这有可能是你下载的模型文件不完整。或者是错的。

我把openai/whisper-small/的[flax_model.msgpack][model.safetensors][pytorch_model.bin][tf_model.h5]四个模型都下载下来了,都不行,这是为什么。没有md5也没法校验是否不一致,但下载过程都没有报错

@lichq5 不止这几个文件,还有很多文件的

我现在在训练的时候会报这个错:
raise ValueError(
"Asking to pad but the tokenizer does not have a padding token. "
"Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) "
"or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})."
)
如果我手动修改源码,加上self.pad_token="[PAD]"这个代码,会影响训练效果吗

这样应该是不行的。 你还是要下载完整的文件去读取里面的token