glynpu/ASR_Audio_Data_Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Apache-2.0

Audio Data Links

A list of common publically (and privately) available audio data that you can download for ASR or other speech activities. All your WERs are belong to us. Inspired by wer are we who stole someone elses joke.

1. FREE

Source	Name & Direct Link	Type	Size(Hours)
OpenSLR	LibriSpeech - Train:100 360 500 Test:Clean Other Dev:Clean Other	Read	960
OpenSLR	TED-LIUM Release 2	Read	118
OpenSLR	TED-LIUM Release 3	Read	452
Voxforge	Voxforge English	Read	130
Mozilla	Common Voice v1	Read	500
Mozilla	Common Voice en_1087h_2019-06-12	Read	1087
Tatoeba	Tatoeba Audio Eng	Read	~200
Valentini	Noisy Speech Database All Files, DOI	Read	TBC

2. PAID

Source	Name	Type	Size(Hours)	Code
LDC	Fisher	Conversational	2000	Speech LDC2004S13 LDC2005S13 Transcripts LDC2004T19 LDC2005T19
LDC	Switchboard Hub 500	Conversational	240	LDC2002S09
LDC	Switchboard Release 2	Conversational	300	LDC97S62
LDC	TIMIT	Read	5	LDC93S1
LDC	Wall Street Journal (WSJ)	Read	80	LDC93S6A or LDC93S6B

TTS

1. FREE

Source	Name & Direct Link	Type	Size(Hours)
Edinburgh CSTR	CSTR VCTK Corpus	Read	44
LJ Speech	LJ Speech	Read	24