why this didn't support Chinese, what's difficult part ?

Question

why this didn't support Chinese, what's difficult part ?

Pana opened this issue 9 years ago · 1 comments

Pana commented 9 years ago

FYI

Answer 1 · 2015-10-13T03:09:28.000Z

Hi @Pana,

The basic algorithm for finding good text is:

Split text into words
Count total number of words
Count number of words that are "stop words" (words that are filler like "the", "and", "or", etc, that occur in real writing)
If the ratio between stop words and total words is good, this is probably useful text so keep it. Otherwise discard it.

Step 1 is implemented very simply. It just splits words where there spaces between words. For Chinese, that doesn't work at all since there are no spaces. Someone would have to implement a way to split text into words for Chinese to work.