/ASR_Audio_Data_Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Apache License 2.0Apache-2.0

Audio Data Links

A list of common publically (and privately) available audio data that you can download for ASR or other speech activities. All your WERs are belong to us. Inspired by wer are we who stole someone elses joke.

1. FREE

Source Name & Direct Link Type Size(Hours)
OpenSLR LibriSpeech - Train:100 360 500
Test:Clean Other Dev:Clean Other
Read 960
OpenSLR TED-LIUM Release 2 Read 118
OpenSLR TED-LIUM Release 3 Read 452
Voxforge Voxforge English Read 130
Mozilla Common Voice v1 Read 500
Mozilla Common Voice en_1087h_2019-06-12 Read 1087
Tatoeba Tatoeba Audio Eng Read ~200
Valentini Noisy Speech Database All Files, DOI Read TBC

2. PAID

Source Name Type Size(Hours) Code
LDC Fisher Conversational 2000 Speech LDC2004S13 LDC2005S13
Transcripts LDC2004T19 LDC2005T19
LDC Switchboard Hub 500 Conversational 240 LDC2002S09
LDC Switchboard Release 2 Conversational 300 LDC97S62
LDC TIMIT Read 5 LDC93S1
LDC Wall Street Journal (WSJ) Read 80 LDC93S6A or LDC93S6B

TTS

1. FREE

Source Name & Direct Link Type Size(Hours)
Edinburgh CSTR CSTR VCTK Corpus Read 44
LJ Speech LJ Speech Read 24