joe32140/flatten_tokenize_convert_chinese_gigaword
Dump the text of the Gigaword dataset into headline and paragraph files including Chinese word tokenization and simplified-to-tranditional Chinese conversion
PythonMIT
No issues in this repository yet.