NielsRogge/Transformers-Tutorials

UDOP - How to change UdopToknizer to another tokenizer that supports CJK languages?

pascona opened this issue · 1 comments

Hi @NielsRogge

We are following the Fine_tune_UDOP_on_a_custom_dataset_(toy_RVL_CDIP_dataset).ipynb notebook example.
We used OCR text and coordinates based on CJK (Chinese, Japanese, Korean).
However, it seems that UDOPTokenizer does not support CJK.
Can you provide a guide or notebook code to change to the LayoutXLMTokenizer instead of the UDOPTokenizer?

+1 the same problem