/speech_dataset

The dataset of Speech Recognition

Apache License 2.0Apache-2.0

The Dataset of Speech Recognition

Chinese

name duration/h address remark application
THCHS-30 30 https://openslr.org/18/
Aishell 150 https://openslr.org/33/
ST-CMDS 110 https://openslr.org/38/
Primewords 99 https://openslr.org/47/
aidatatang 200 https://openslr.org/62/
MagicData 755 https://openslr.org/68/
ASR&SD 160 http://ncmmsc2021.org/competition2.html if available
Aishell2 1000 http://www.aishelltech.com/aishell_2 if available
TAL ASR 100 https://ai.100tal.com/dataset
Common Voice 63 https://commonvoice.mozilla.org/zh-CN/datasets Common Voice Corpus 7.0
ASRU2019 ASR 500 https://www.datatang.com/competition if available
2021 SLT CSRC 398 https://www.data-baker.com/csrc_challenge.html if available
aidatatang_1505zh 1505 https://datatang.com/opensource if available
WenetSpeech 10000 https://github.com/wenet-e2e/WenetSpeech
KeSpeech 1542 https://openreview.net/forum?id=b3Zoeq2sCLq speech recognition, speaker verification, subdialect identification, voice conversion
MagicData-RAMC 180 https://arxiv.org/pdf/2203.16844.pdf conversational speech data recorded from native speakers of Mandarin Chinese
Mandarin Heavy Accent Conversational Speech Corpus 58.78 https://magichub.com/datasets/mandarin-heavy-accent-conversational-speech-corpus/
Free ST Chinese Mandarin Corpus - https://openslr.org/38/

English

name duration/h address remark
Common Voice 2015 https://commonvoice.mozilla.org/zh-CN/datasets Common Voice Corpus 7.0
LibriSpeech 960 https://openslr.org/12/
ST-AEDS-20180100 4.7 http://www.openslr.org/45/
TED-LIUM Release 3 430 https://openslr.org/51/
Multilingual LibriSpeech 44659 https://openslr.org/94/ limited supervision
SPGISpeech 5000 https://datasets.kensho.com/datasets/scribe if available
Speech Commands 10 https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/data
2020AESRC 160 https://datatang.com/INTERSPEECH2020 if available
GigaSpeech 10000 https://github.com/SpeechColab/GigaSpeech
The People’s Speech 31400 https://openreview.net/pdf?id=R8CwidgJ0yT
Earnings-21 39 https://arxiv.org/abs/2104.11348
VoxPopuli 24100+543 https://arxiv.org/pdf/2101.00390.pdf 24100(unlabeled), 543(transcribed)
CMU Wilderness Multilingual Speech Dataset 13 http://festvox.org/cmu_wilderness/ Multilingual

Chinese-English

name duration/h address remark
SEAME 30 https://www.isca-speech.org/archive_v0/archive_papers/interspeech_2010/i10_1986.pdf
TAL CSASR 587 https://ai.100tal.com/dataset
ASRU2019 CSASR 200 https://www.datatang.com/competition if available
ASCEND 10.62 https://arxiv.org/pdf/2112.06223.pdf

Japanese (ja-JP)

name duration/h address remark
Common Voice 26 https://commonvoice.mozilla.org/zh-CN/datasets Common Voice Corpus 7.0
Japanese_Scripted_Speech_Corpus_Daily_Use_Sentence 18 https://magichub.io/cn/datasets/japanese-scripted-speech-corpus-daily-use-sentence/
LaboroTVSpeech 2000 https://arxiv.org/pdf/2103.14736.pdf
CSJ 650 https://github.com/kaldi-asr/kaldi/tree/master/egs/csj
JTubeSpeech 1300 https://arxiv.org/pdf/2112.09323.pdf

Korean (ko-KR)

name duration/h address remark
korean-scripted-speech-corpus-daily-use-sentence 4.3 https://magichub.io/cn/datasets/korean-scripted-speech-corpus-daily-use-sentence/
korean-conversational-speech-corpus 5.22 https://magichub.io/cn/datasets/korean-conversational-speech-corpus/

Russian (ru-RU)

name duration/h address remark
Common Voice 148 https://commonvoice.mozilla.org/zh-CN/datasets Common Voice Corpus 7.0
OpenSTT 20000 https://arxiv.org/pdf/2006.08274.pdf limited supervision

French (fr-Fr)

name duration/h address remark
MediaSpeech 10 https://arxiv.org/pdf/2103.16193.pdf ASR system evaluation dataset

Spanish (es-ES)

name duration/h address remark
MediaSpeech 10 https://arxiv.org/pdf/2103.16193.pdf ASR system evaluation dataset

Turkish (tr-TR)

name duration/h address remark
MediaSpeech 10 https://arxiv.org/pdf/2103.16193.pdf ASR system evaluation dataset

Arabic (ar)

name duration/h address remark
MediaSpeech 10 https://arxiv.org/pdf/2103.16193.pdf ASR system evaluation dataset

noise & nonspeech

name duration/h address remark
MUSAN - https://openslr.org/17/
Room Impulse Response and Noise Database - https://openslr.org/28/
AudioSet - https://ieeexplore.ieee.org/document/7952261


The Dataset of Speech Synthesis

Chinese

name duration/h address remark
Aishell3 85 https://openslr.org/93/
Opencpop - https://wenet.org.cn/opencpop/download/ Singing Voice Synthesis

English

name duration/h address remark
Hi-Fi Multi-Speaker English TTS Dataset 291.6 https://openslr.org/109/
LibriTTS corpus 585 https://openslr.org/60/
Speechocean762 - https://www.openslr.org/101/
RyanSpeech 10 http://mohammadmahoor.com/ryanspeech/


The Dataset of Speech Recognition & Speaker Diarization

Chinese

name duration/h address remark application
Aishell4 120 https://openslr.org/111/ 8-channel, conference scenarios speech recognition, speaker diarization
ASR&SD 160 http://ncmmsc2021.org/competition2.html if available speech recognition, speaker diarization
zhijiangcup - https://zhijiangcup.zhejianglab.com/zhijiang/match/details/id/6.html if available speech recognition, speaker diarization
M2MET 120 https://arxiv.org/pdf/2110.07393.pdf 8-channel, conference scenarios speech recognition, speaker diarization

English

name duration/h address remark application
CHiME-6 - https://chimechallenge.github.io/chime6/download.html if available speech recognition, speaker diarization


The Dataset of Speaker Recognition

Chinese

name duration/h address remark application
CN-Celeb - https://openslr.org/82/
KeSpeech 1542 https://openreview.net/forum?id=b3Zoeq2sCLq speech recognition, speaker verification, subdialect identification, voice conversion
MTASS 55.6 https://github.com/Windstudent/Complex-MTASSNet
THCHS-30 40 http://www.openslr.org/18/

English

name duration/h address remark
VoxCeleb Data - http://www.robots.ox.ac.uk/~vgg/data/voxceleb/

The Dataset of Voice Activity Detection

French

name duration/h address remark application
InaGVAD 5 https://github.com/ina-foss/InaGVAD 10 radio and 18 TV channels Voice Activity Detection, Speaker Gender Segmentation, Gender Monitoring