Possible error in adding new tokens to t5_tokenizer

Is the extra space supposed to be here for the <= and < tokens for the t5_tokenizer?

rasat/seq2seq/preprocess/lgerels2t5rels_changeOrder.py

Line 433 in b912ce8

t5_tokenizer.add_tokens([AddedToken(" <="), AddedToken(" <")])

We add these two tokens to consistant with the tokenizer in run_seq2seq.py as

Line 155 in b912ce8

tokenizer.add_tokens([AddedToken(" <="), AddedToken(" <")])

i see---in either case, i ran evaluation on spider dataset both with and without the extra spacing and it doesn't seem to make a difference in terms of the accuracy.

There are only a few examples contain "<" or "<=" in this dataset, I guess this is why it was not affected.

that makes sense. i guess i was originally curious as to why the extra space is there---i.e., why is it <= instead of <=.

well, we just keep what PICARD author does and do not change it,
https://github.com/ServiceNow/picard/blob/6a252386bed6d4233f0f13f4562d8ae8608e7445/seq2seq/run_seq2seq.py#L140