doc-analysis/XFUND

format of zh and ja

bakhbyergyen opened this issue · 0 comments

hi, I wanted to know that, why zh and ja datasets are split by character? not word by word?
when building a dataset, sentences can be split by words, not characters?
thank you.
image