/Sogou-corpus-processing

搜狗的语料库的格式似乎是gb18030,需要去除xml标签

Primary LanguagePython

No issues in this repository yet.