/cctv_news_data

Mandarin text corpus from CCTV news, 2007 - present

Primary LanguagePython

CCTV NEWS DATA

Mandarin text corpus, CCTV short news, title + passage, 2007 - present

Dependency

sentencepiece

Example usage:

python3 get_cctv.py --history --location ../data 

License

GNU General Public License v2.0