UDOP - How to change UdopToknizer to another tokenizer that supports CJK languages?

Question

UDOP - How to change UdopToknizer to another tokenizer that supports CJK languages?

pascona opened this issue 2 months ago · 1 comments

We are following the Fine_tune_UDOP_on_a_custom_dataset_(toy_RVL_CDIP_dataset).ipynb notebook example.
We used OCR text and coordinates based on CJK (Chinese, Japanese, Korean).
However, it seems that UDOPTokenizer does not support CJK.
Can you provide a guide or notebook code to change to the LayoutXLMTokenizer instead of the UDOPTokenizer?

Answer 1 · 2024-04-17T07:55:59.000Z

+1 the same problem