joe32140/flatten_tokenize_convert_chinese_gigaword

Dump the text of the Gigaword dataset into headline and paragraph files including Chinese word tokenization and simplified-to-tranditional Chinese conversion

PythonMIT

No issues in this repository yet.