Serialization error when tokenizer_config key matches function name in PreTrainedTokenizerBase
avnermay opened this issue · 2 comments
transformers/src/transformers/tokenization_utils_base.py
Lines 2449 to 2451 in 37bba2a
When one of the keys in self.init_kwargs
matches the name of a function in PreTrainedTokenizerBase (e.g., add_special_tokens
), this for loops replaces the value for that key in tokenizer_config
with the function object, which is not serializable, thus causing an error during save_pretrained
.
To solve this issue, one option is to add an assert in the __init__
function that throws an error if one of the keys matches an existing attribute/function on the PreTrainedTokenizerBase
:
This error was also raised in the Stack Overflow issue below:
https://stackoverflow.com/questions/78062739/huggingface-transformers-error-when-saving-model-typeerror-object-of-type-meth
Yep, this is known. I remember saying that I'd rather have a failure than duplicate attribute / functions.
Do you want to open a PR to add some kind of check?
I am fine with doing this in the init as long as it does not slow it down too much