nltk/nltk_data

Chinese simplified stopwords

MangoPomelo opened this issue · 4 comments

What is the source of these?

https://github.com/goto456/stopwords/blob/master/%E7%99%BE%E5%BA%A6%E5%81%9C%E7%94%A8%E8%AF%8D%E8%A1%A8.txt

He says this stopwords list is from BAIDU, the largest simplified Chinese searching engine.
I have checked it and deleted part of words which contain non-Chinese characters.

Alqua commented

It would be great to have chinese on ntlk

Resolved in aa54613
Sorry for the long delay