Chinese
name | duration/h | address | remark |
---|---|---|---|
THCHS-30 | 30 | https://openslr.org/18/ | |
Aishell | 150 | https://openslr.org/33/ | |
ST-CMDS | 110 | https://openslr.org/38/ | |
Primewords | 99 | https://openslr.org/47/ | |
aidatatang | 200 | https://openslr.org/62/ | |
MagicData | 755 | https://openslr.org/68/ | |
ASR&SD | 160 | http://ncmmsc2021.org/competition2.html | if available |
Aishell2 | 1000 | http://www.aishelltech.com/aishell_2 | if available |
TAL ASR | 100 | https://ai.100tal.com/dataset | |
Common Voice | 63 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 |
ASRU2019 ASR | 500 | https://www.datatang.com/competition | if available |
2021 SLT CSRC | 398 | https://www.data-baker.com/csrc_challenge.html | if available |
aidatatang_1505zh | 1505 | https://datatang.com/opensource | if available |
WenetSpeech | 10000 | https://github.com/wenet-e2e/WenetSpeech | if available |
English
name | duration/h | address | remark |
---|---|---|---|
Common Voice | 2015 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 |
LibriSpeech | 960 | https://openslr.org/12/ | |
ST-AEDS-20180100 | 4.7 | http://www.openslr.org/45/ | |
TED-LIUM Release 3 | 430 | https://openslr.org/51/ | |
Multilingual LibriSpeech | 44659 | https://openslr.org/94/ | limited supervision |
SPGISpeech | 5000 | https://datasets.kensho.com/datasets/scribe | if available |
Speech Commands | 10 | https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/data | |
2020AESRC | 160 | https://datatang.com/INTERSPEECH2020 | if available |
GigaSpeech | 10000 | https://github.com/SpeechColab/GigaSpeech | if available |
The People’s Speech | 31400 | https://openreview.net/pdf?id=R8CwidgJ0yT | |
Earnings-21 | 39 | https://arxiv.org/abs/2104.11348 | |
VoxPopuli | 24100+543 | https://arxiv.org/pdf/2101.00390.pdf | 24100(unlabeled), 543(transcribed) |
Chinese-English
name | duration/h | address | remark |
---|---|---|---|
TAL CSASR | 587 | https://ai.100tal.com/dataset | |
ASRU2019 CSASR | 200 | https://www.datatang.com/competition | if available |
Japanese (ja-JP)
name | duration/h | address | remark |
---|---|---|---|
Common Voice | 26 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 |
Japanese_Scripted_Speech_Corpus_Daily_Use_Sentence | 18 | https://magichub.io/cn/datasets/japanese-scripted-speech-corpus-daily-use-sentence/ | |
LaboroTVSpeech | 2000 | https://arxiv.org/pdf/2103.14736.pdf | |
CSJ | 650 | https://github.com/kaldi-asr/kaldi/tree/master/egs/csj |
Korean (ko-KR)
name | duration/h | address | remark |
---|---|---|---|
korean-scripted-speech-corpus-daily-use-sentence | 4.3 | https://magichub.io/cn/datasets/korean-scripted-speech-corpus-daily-use-sentence/ | |
korean-conversational-speech-corpus | 5.22 | https://magichub.io/cn/datasets/korean-conversational-speech-corpus/ |
Russian (ru-RU)
name | duration/h | address | remark |
---|---|---|---|
Common Voice | 148 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 |
OpenSTT | 20000 | https://arxiv.org/pdf/2006.08274.pdf | limited supervision |
noise & nonspeech
name | duration/h | address | remark |
---|---|---|---|
MUSAN | - | https://openslr.org/17/ | |
Room Impulse Response and Noise Database | - | https://openslr.org/28/ | |
AudioSet | - | https://ieeexplore.ieee.org/document/7952261 |
Chinese
name | duration/h | address | remark |
---|---|---|---|
Aishell3 | 85 | https://openslr.org/93/ |
English
name | duration/h | address | remark |
---|---|---|---|
Hi-Fi Multi-Speaker English TTS Dataset | 291.6 | https://openslr.org/109/ | |
LibriTTS corpus | 585 | https://openslr.org/60/ |
Chinese
name | duration/h | address | remark |
---|---|---|---|
Aishell4 | 120 | https://openslr.org/111/ | 8-channel, conference scenarios |
ASR&SD | 160 | http://ncmmsc2021.org/competition2.html | if available |
English
name | duration/h | address | remark |
---|---|---|---|
CHiME-6 | - | https://chimechallenge.github.io/chime6/download.html | if available |
Chinese
name | duration/h | address | remark |
---|---|---|---|
CN-Celeb | - | https://openslr.org/82/ |
English
name | duration/h | address | remark |
---|---|---|---|
VoxCeleb Data | - | http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ |