/THCHS-30

A Free Chinese Speech Corpus Released by CSLT@Tsinghua University

Apache License 2.0Apache-2.0

THCHS-30

THCHS30 is an open Chinese speech database published by Center for Speech and Language Technology (CSLT) at Tsinghua University. You can cite the data using the following BibTeX entry:

@misc{THCHS30_2015,
  title={THCHS-30 : A Free Chinese Speech Corpus},
  author={Dong Wang, Xuewei Zhang, Zhiyong Zhang},
  year={2015},
  url={http://arxiv.org/abs/1512.01882}
}

The data was obtained from http://www.openslr.org/18/ . The original .wav files were converted to .mp3 at 22kHz. Only the data/ directory is kept. The train/, dev/ and test/ directories, which contained symlinks to data/ are not in this repository.