CQCL/lambeq

Fix transformers-related warning

Opened this issue · 1 comments

Dear lambeq developers,

I was playing around with the package and testing the parsing example in the bobcat tutorial by simply running

parser = BobcatParser()
diagram = parser.sentence2diagram('A sentence here.')
diagram.draw()

and realised I am getting the following warning, mainly coming from the transformers library:

\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884

Following the suggestions by the huggingface developers in the opened issue, if a parameter clean_up_tokenization_spaces=True is added to the tokenizer definition here:

tokenizer = AutoTokenizer.from_pretrained(model_dir)

the warning will be resolved.
I can confirm this worked for me locally.
Many thanks!

@arashmath Thank you very much for the suggestion, we'll add it in a future release.