Serialization error when tokenizer_config key matches function name in PreTrainedTokenizerBase

transformers/src/transformers/tokenization_utils_base.py

Lines 2449 to 2451 in 37bba2a

    
           for k in target_keys: 
        
               if hasattr(self, k): 
        
                   tokenizer_config[k] = getattr(self, k)

When one of the keys in self.init_kwargs matches the name of a function in PreTrainedTokenizerBase (e.g., add_special_tokens), this for loops replaces the value for that key in tokenizer_config with the function object, which is not serializable, thus causing an error during save_pretrained.

To solve this issue, one option is to add an assert in the __init__ function that throws an error if one of the keys matches an existing attribute/function on the PreTrainedTokenizerBase:

transformers/src/transformers/tokenization_utils_base.py

Line 1569 in 37bba2a

self.init_kwargs = copy.deepcopy(kwargs)

This error was also raised in the Stack Overflow issue below:
https://stackoverflow.com/questions/78062739/huggingface-transformers-error-when-saving-model-typeerror-object-of-type-meth

cc @ArthurZucker

Yep, this is known. I remember saying that I'd rather have a failure than duplicate attribute / functions.
Do you want to open a PR to add some kind of check?
I am fine with doing this in the init as long as it does not slow it down too much

	for k in target_keys:
	if hasattr(self, k):
	tokenizer_config[k] = getattr(self, k)