UDOP - How to change UdopToknizer to another tokenizer that supports CJK languages?
pascona opened this issue · 1 comments
pascona commented
Hi @NielsRogge
We are following the Fine_tune_UDOP_on_a_custom_dataset_(toy_RVL_CDIP_dataset).ipynb notebook example.
We used OCR text and coordinates based on CJK (Chinese, Japanese, Korean).
However, it seems that UDOPTokenizer does not support CJK.
Can you provide a guide or notebook code to change to the LayoutXLMTokenizer instead of the UDOPTokenizer?
NguyenHongSon1103 commented
+1 the same problem