A list of open speech corpora for Speech Technology research and development.
This list has a preference for free (i.e. no $ cost) and truly open corpora (i.e. some kind of Creative Commons license). Not all these corpora may meet those criteria, but all the following corpora are accessible and usable for research and/or commercial use.
Feel free to propse additions to the list!
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
Common Voice English | English | 1,118 hours (validated); 1,488 hours (total) | 51,072 speakers (reported: 13% female / 46% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice German | German | 483 hours (validated); 538 hours (total) | 8,460 speakers (reported: 9% female / 67% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice French | French | 350 hours (validated); 412 hours (total) | 8,164 speakers (reported: 12% female / 65% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Welsh | Welsh | 59 hours (validated); 77 hours (total) | 1,149 speakers (reported: 18% female / 29% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Breton | Breton | 5 hours (validated); 12 hours (total) | 133 speakers (reported: 2% female / 55% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Chuvash | Chuvash | <1 hour (validated); 2 hours (total) | 38 speakers (reported: 0% female / 47% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Turkish | Turkish | 13 hours (validated); 14 hours (total) | 461 speakers (reported: 8% female / 74% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Tatar | Tatar | 25 hours (validated); 27 hours (total) | 142 speakers (reported: 2% female / 81% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Kyrgyz | Kyrgyz | 11 hours (validated); 21 hours (total) | 119 speakers (reported: 44% female / 45% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Irish | Irish | 2 hour (validated); 4 hour (total) | 80 speakers (reported: 16% female / 59% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Kabyle | Kabyle | 262 hours (validated); 276 hours (total) | 693 speakers (reported: 22% female / 55% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Catalan | Catalan | 245 hours (validated); 295 hours (total) | 3,724 speakers (reported: 35% female / 43% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Taiwanese Mandarin | Taiwanese Mandarin | 42 hours (validated); 60 hours (total) | 1,108 speakers (reported: 26% female / 48% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Slovenian | Slovenian | 3 hour (validated); 6 hours (total) | 51 speakers (reported: 16% female / 80% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Italian | Italian | 85 hours (validated); 122 hours (total) | 4,292 speakers (reported: 18% female / 47% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Dutch | Dutch | 24 hours (validated); 33 hours (total) | 701 speakers (reported: 10% female / 66% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Hakha Chin | Hakha Chin | 2 hours (validated); 5 hours (total) | 290 speakers (reported: 20% female / 23% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Esperanto | Esperanto | 35 hours (validated); 41 hours (total) | 215 speakers (reported: 7% female / 70% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Estonian | Estonian | 10 hours (validated); 13 hours (total) | 230 speakers (reported: 38% female / 57% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Persian | Persian | 211 hours (validated); 255 hours (total) | 2,763 speakers (reported: 6% female / 78% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Basque | Basque | 65 hours (validated); 99 hours (total) | 638 speakers (reported: 23% female / 51% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Spanish | Spanish | 167 hours (validated); 221 hours (total) | 8,252 speakers (reported: 10% female / 55% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Mandarin | Mandarin (China) | 26 hours (validated); 31 hours (total) | 963 speakers (reported: 10% female / 64% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Mongolian | Mongolian | 9 hours (validated); 12 hours (total) | 296 speakers (reported: 25% female / 36% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Sakha | Sakha | 3 hours (validated); 6 hours (total) | 37 speakers (reported: 10% female / 54% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Dhivehi | Dhivehi | 6 hours (validated); 8 hours (total) | 101 speakers (reported: 64% female / 28% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Kinyarwanda | Kinyarwanda | <1 hours (validated); 17 hours (total) | 129 speakers (reported: 8% female / 41% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Swedish | Swedish | 5 hours (validated); 6 hours (total) | 99 speakers (reported: 8% female / 74% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Russian | Russian | 72 hours (validated); 76 hours (total) | 496 speakers (reported: 23% female / 71% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Indonesian | Indonesian | 3 hours (validated); 3 hours (total) | 56 speakers (reported: 4% female / 82% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Arabic | Arabic | 7 hours (validated); 12 hours (total) | 228 speakers (reported: 24% female / 48% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Tamil | Tamil | 3 hours (validated); 4 hours (total) | 91 speakers (reported: 10% female / 67% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Interlingua | Interlingua | 1 hours (validated); 3 hours (total) | 12 speakers (reported: 2% female / 94% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Portuguese | Portuguese | 27 hours (validated); 29 hours (total) | 354 speakers (reported: 2% female / 89% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Latvian | Latvian | 4 hours (validated); 6 hours (total) | 86 speakers (reported: 17% female / 64% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Japanese | Japanese | 3 hours (validated); 3 hours (total) | 52 speakers (reported: 0% female / 81% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Votic | Votic | <1 hours (validated); <1 hours (total) | 2 speakers (reported: 0% female / 0% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Abkhaz | Abkhaz | <1 hours (validated); <1 hours (total) | 3 speakers (reported: 2% female / 98% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Common Voice Chinese (Hong Kong) | Chinese (Hong Kong) | <1 hours (validated); <1 hours (total) | 15 speakers (reported: 24% female / 37% male) | https://voice.mozilla.org/en/datasets | CC-0 |
Yesno | Hebrew | 6 mins | one male | http://www.openslr.org/1/ | CC-0 |
LJ Speech Corpus | English | ~24 hours | one female | https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2 | CC-0 |
NST Danish ASR Database | Danish | 229,992 utterances | 616 speakers | original: https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-19/, reorganized: https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-55/ | CC-0 |
NST Danish Dictation | Danish | 34,955 utterances | 151 speakers | https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-20/ | CC-0 |
NST Danish Speech Synthesis | Danish | 4,108 utterances | 1 male speaker | https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-21/ | CC-0 |
NST Swedish ASR Database | Swedish | 366,000 utterances | 1,000 speakers | original: https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-16/, reorganized: https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-56/ | CC-0 |
NST Swedish Dictation | Swedish | 45,620 utterances | 195 speakers | https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-17/ | CC-0 |
NST Swedish Speech Synthesis | Swedish | 5,279 utterances | 1 male speaker | https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-18/ | CC-0 |
NST Norwegian ASR Database | Norwegian | 359,760 utterances | 980 speakers | original: https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-13/, reorganized: https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-54/ | CC-0 |
NST Norwegian Dictation | Norwegian | 33,360 utterances | 144 speakers | https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-14/ | CC-0 |
NST Norwegian Speech Synthesis | Norwegian | 5,363 utterances | 1 male speaker | https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-15/ | CC-0 |
NB Tale – Speech Database for Norwegian | Norwegian | 7,600 utterances + ~12 hours | 380 speakers | https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-31/ | CC-0 |
Norwegian Parliamentary Speech Corpus (v0.1) | Norwegian | ~59 hours | 203 speakers | https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-58/ | CC-0 |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
ARU Speech Corpus | English (UK) | 720 utterances / speaker | 12 (6 femals; 6 male) | http://datacat.liverpool.ac.uk/681/1/ARU_Speech_Corpus_v1_0.zip | CC-BY 3.0 |
Althingi Parliamentary Speech Corpus | Icelandic | 542 hours and 25 minutes | 196 speakers | http://www.malfong.is/index.php?dlid=73&lang=en | CC-BY 4.0 |
Alþingisumræður Parliamentary Speech Corpus | Icelandic | ~21 hours | http://www.malfong.is/index.php?dlid=8&lang=en | CC-BY 3.0 | |
Hjal Corpus | Icelandic | ~41,000 recordings | 883 speakers | http://www.malfong.is/index.php?dlid=5&lang=en | CC-BY 3.0 |
The Malromur Corpus | Icelandic | 152 hours | 563 speakers | http://www.malfong.is/index.php?dlid=65&lang=en | CC-BY 4.0 |
Telecooperation German Corpus for Kinect | German | ~35 hours | ~180 speakers | http://www.repository.voxforge1.org/downloads/de/german-speechdata-TUDa-2015.tar.gz | CC-BY 2.0 |
African Speech Technology English-English Speech Corpus | English | ~21 hours | https://repo.sadilar.org/handle/20.500.12185/283 | CC-BY 2.5 South Africa | |
African Speech Technology isiXhosa Speech Corpus | isiXhosa | ~26 hours | https://repo.sadilar.org/handle/20.500.12185/305 | CC-BY 2.5 South Africa | |
NCHLT Afrikaans | Afrikaans | 56 hours | 210 speakers (98 female / 112 male) | https://repo.sadilar.org/handle/20.500.12185/280 | CC-BY 3.0 |
NCHLT English | English | 56 hours | 210 speakers (100 female / 110 male) | https://repo.sadilar.org/handle/20.500.12185/274 | CC-BY 3.0 |
NCHLT isiNdebele | isiNdebele | 56 hours | 148 speakers (78 female / 70 male) | https://repo.sadilar.org/handle/20.500.12185/272 | CC-BY 3.0 |
NCHLT isiXhosa | isiXhosa | 56 hours | 209 speakers (106 female / 103 male) | https://repo.sadilar.org/handle/20.500.12185/279 | CC-BY 3.0 |
NCHLT isiZulu | isiZulu | 56 hours | 210 speakers (98 female / 112 male) | https://repo.sadilar.org/handle/20.500.12185/275 | CC-BY 3.0 |
NCHLT Sepedi | Sepedi | 56 hours | 210 speakers (100 female / 110 male) | https://repo.sadilar.org/handle/20.500.12185/270 | CC-BY 3.0 |
NCHLT Sesotho | Sesotho | 56 hours | 210 speakers (113 female / 97 male) | https://repo.sadilar.org/handle/20.500.12185/278 | CC-BY 3.0 |
NCHLT Setswana | Setswana | 56 hours | 210 speakers (109 female / 101 male) | https://repo.sadilar.org/handle/20.500.12185/281 | CC-BY 3.0 |
NCHLT Siswati | Siswati | 56 hours | 197 speakers (96 female / 101 male) | https://repo.sadilar.org/handle/20.500.12185/271 | CC-BY 3.0 |
NCHLT Tshivenda | Tshivenda | 56 hours | 208 speakers (83 female / 125 male) | https://repo.sadilar.org/handle/20.500.12185/276 | CC-BY 3.0 |
NCHLT Xitsonga | Xitsonga | 56 hours | 198 speakers (95 female/103 male) | https://repo.sadilar.org/handle/20.500.12185/277 | CC-BY 3.0 |
Lwazi II Cross-lingual Proper Name Corpus | Afrikaans; English; isiZulu; Sesotho | 2 hours 5 mins | 20 speakers | https://repo.sadilar.org/handle/20.500.12185/445 | CC-BY 3.0 |
Lwazi II Proper Name Call Routing Telephone Corpus | English | 2 hours 7 mins | https://repo.sadilar.org/handle/20.500.12185/448 | CC-BY 3.0 | |
Lwazi II Afrikaans Trajectory Tracking Corpus | Afrikaans | 4 hours | one male | https://repo.sadilar.org/handle/20.500.12185/442 | CC-BY 3.0 |
LibriSpeech | English | ~1000 hours | 2484 speakers (1201 female / 1283 male) | http://www.openslr.org/12/ | CC-BY 4.0 |
Zeroth-Korean | Korean | 52.8 hours | 115 speakers | http://www.openslr.org/40/ | CC-BY 4.0 |
Speech Commands | English | 17.8 hours | >1,000 speakers | https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html | CC-BY 4.0 |
ParlamentParla | Catalan | 320 hours | https://www.openslr.org/59/ | CC-BY 4.0 | |
SIWIS | French | ~10 hours | one female | http://datashare.is.ed.ac.uk/download/DS_10283_2353.zip | CC-BY 4.0 |
VCTK | English | 44 hours | 109 speakers | http://datashare.is.ed.ac.uk/download/DS_10283_3443.zip | CC-BY 4.0 |
LibriTTS | English | 586 hours | 2,456 speakers (1,185 female / 1,271 male) | http://www.openslr.org/60/ | CC-BY 4.0 |
Augmented LibriSpeech | Audio (English); Text (English, French) | 236 hours | https://persyval-platform.univ-grenoble-alpes.fr/datasets/DS91 | CC-BY 4.0 | |
Helsinki Prosody Corpus | English | 262.5 hours | 1,230 speakers | https://github.com/Helsinki-NLP/prosody | CC-BY 4.0 |
Tuva Speech Database | Norwegian | 24 hours | 40 speakers | https://www.nb.no/sprakbanken/show?serial=oai:nb.no:sbr-44&lang= | CC-BY 4.0 |
COERLL Kʼicheʼ corpus | Kʼicheʼ | 34 minutes | ? speakers | https://cl.indiana.edu/~ftyers/resources/utexas-kiche-audio.tar.gz | CC-BY 4.0 |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
Iban | Iban | 8 hours | http://www.openslr.org/24/ https://github.com/sarahjuan/iban | CC-BY-SA 2.0 | |
Vystadial | English; Czech | 41 hours; 15 hours | http://www.openslr.org/6/ | CC-BY-SA 3.0 US | |
Free Spoken Digit Dataset | English | 2,000 isolated digits | 4 speakers | https://github.com/Jakobovski/free-spoken-digit-dataset | CC-BY-SA 4.0 |
Google Javanese | Javanese | 296 hours | 1019 speakers | http://www.openslr.org/35/ | CC-BY-SA 4.0 |
Google Nepali | Nepali | 165 hours | 527 speakers | http://www.openslr.org/54/ | CC-BY-SA 4.0 |
Google Bengali | Bengali | 229 hours | 508 speakers | http://www.openslr.org/53/ | CC-BY-SA 4.0 |
Google Sinhala | Sinhala | 224 hours | 478 speakers | http://www.openslr.org/52/ | CC-BY-SA 4.0 |
Google Sundanese | Sundanese | 333 hours | 542 speakers | http://www.openslr.org/36/ | CC-BY-SA 4.0 |
Spoken Wikipedia Corpus (SWC-2017) | English; German; Dutch | 182 hours; 249 hours; 79 hours | 395 speakers; 339 speakers; 145 speakers | https://nats.gitlab.io/swc/ | CC-BY-SA 4.0 |
Chuvash TTS | Chuvash | 4 hours | 1 speaker | https://github.com/ftyers/Turkic_TTS | CC-BY-SA 4.0 |
Forschergeist | German | 2 hours | 2 speakers (1 female; 1 male) | female speaker: https://goofy.zamia.org/zamia-speech/corpora/forschergeist/annettevogt-20180320-rec.tgz; male speaker: https://goofy.zamia.org/zamia-speech/corpora/forschergeist/timpritlove-20180320-rec.tgz | CC-BY-SA 4.0 |
Malayalam Speech Corpus by SMC | Malayalam | 1:36 hours | 75 speakers (3 female, 12 male, 60 unidentified) | https://releases.smc.org.in/msc-reviewed-speech/ | CC-BY-SA 4.0 |
Google Malayalam | Malayalam | 3.02 hours | 24 speakers | http://www.openslr.org/63/ | CC-BY-SA 4.0 |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
IBM Recorded Debates v1 | English | 5 hours | 10 speakers | https://www.research.ibm.com/haifa/dept/vst/debating_data.shtml#Debate%20Speech%20Analysis | CC-BY-ND |
IBM Recorded Debates v2 | English | ~14 hours | 14 speakers | https://www.research.ibm.com/haifa/dept/vst/debating_data.shtml#Debate%20Speech%20Analysis | CC-BY-ND |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
TV3Parla | Catalan | 240 hours | http://laklak.eu/share/tv3_0.3.tar.gz | CC-BY-NC 4.0 | |
Russian Open STT Corpus | Russian | ~10,000 hours public, ~10,000 more upon request | https://github.com/snakers4/open_stt/#links | CC-BY-NC 4.0 with some exceptions | |
Russian Open TTS Corpus | Russian | 145 hours | 3 males | https://github.com/snakers4/open_tts/#links | CC-BY-NC 4.0 with some expections |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
CHiME-Home | English | 6.8 hours | https://archive.org/details/chime-home | CC-BY-NC-SA 3.0 | |
Cameroon Pidgin English Corpus | Cameroon Pidgin English | ~17 hours | http://ota.ox.ac.uk/text/2563.zip | CC-BY-NC-SA 3.0 |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
Tatoeba-Eng | English | ~250 hours (rough estimate) | 6 speakers | https://voice.mozilla.org/en/datasets | CC BY-NC 4.0 (some audio) / CC BY-NC-ND 3.0 (most audio) / CC BY 2.0 (all text) |
TED-LIUM | English | 118 hours | 685 speakers (36h female / 81h male) | http://www.openslr.org/7/ | CC-BY-NC-ND 3.0 |
TED-LIUM-2 | English | 207 hours | 1242 speakers (66h female / 141h male) | http://www.openslr.org/19/ | CC-BY-NC-ND 3.0 |
TED-LIUM-3 | English | 452 hours | 2028 speakers (134h female / 316h male) | http://www.openslr.org/51/ | CC-BY-NC-ND 3.0 |
Pansori TEDxKR | Korean | 3 hours | 41 speakers | http://www.openslr.org/58/ | CC-BY-NC-ND 4.0 |
Primewords Mandarin | Mandarin | 100 hours | 296 speakers | http://www.openslr.org/47/ | CC-BY-NC-ND 4.0 |
MuST-C v1.0 | Audio (English); Text (Dutch, French, German, Italian, Portuguese, Romanian, Russian, Spanish) | 408, 504, 492, 465, 442, 385, 432, 489 hours per language pair | https://ict.fbk.eu/must-c-release-v1-0/ | CC-BY-NC-ND 4.0 |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
DiPCo | English | ~5 hours | 32 speakers (13 female; 19 male) | https://s3.amazonaws.com/dipco/DiPCo.tgz | CDLA-Permissive-1.0 |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
VoxForge | English | ~120 hours | ~2966 speakers | http://www.repository.voxforge1.org/downloads/en/Trunk/Audio/Main/16kHz_16bit/ https://voice.mozilla.org/en/datasets | GNU-GPL 3.0 |
VoxForge | Russian | http://www.repository.voxforge1.org/downloads/ru/Trunk/Audio/Main/16kHz_16bit/ http://www.repository.voxforge1.org/downloads/Russian/Trunk/Audio/Main/16kHz_16bit/ | GNU-GPL 3.0 | ||
VoxForge | German | http://www.repository.voxforge1.org/downloads/de/Trunk/Audio/Main/16kHz_16bit/ | GNU-GPL 3.0 |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
AISHELL-1 | Mandarin | 170 hours | 400 speakers | http://www.openslr.org/33/ | Apache 2.0 |
Tunisian_MSA | Modern Standard Arabic (Tunisia) | 11.2 hours | 118 speakers | http://www.openslr.org/46/ | Apache 2.0 |
African Accented French | French | 22 hours | 232 speakers | http://www.openslr.org/57/ | Apache 2.0 |
THCHS-30 | Mandarin Chinese | 33.57 hours (13,389 utterances) | 40 speakers (31 female; 9 male) | http://www.openslr.org/18/ | Apache 2.0 |
Living Audio Dataset - Dutch | Dutch | 57:49 min | 1 speaker | https://github.com/Idlak/Living-Audio-Dataset | Apache 2.0 |
Living Audio Dataset - English | English | 50:50 min | 1 speaker | https://github.com/Idlak/Living-Audio-Dataset | Apache 2.0 |
Living Audio Dataset - Irish | Irish | 61:56 min | 1 speaker | https://github.com/Idlak/Living-Audio-Dataset | Apache 2.0 |
Living Audio Dataset - Russian | Russian | 34:58 min | 1 speaker | https://github.com/Idlak/Living-Audio-Dataset | Apache 2.0 |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
ALFFA | Amharic;Hausa (paid); Swahili; Wolof | http://www.openslr.org/25/ https://github.com/besacier/ALFFA_PUBLIC | MIT |
CORPUS | LANGUAGES | # HOURS | # SPEAKERS | DOWNLOAD | LICENSE |
---|---|---|---|---|---|
M-AILABS German Corpus | German | 237 hours and 22 minutes | http://www.caito.de/data/Training/stt_tts/de_DE.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) | |
M-AILABS Queen's English Corpus | Queen's English | 45 hours and 35 minutes | http://www.caito.de/data/Training/stt_tts/en_UK.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) | |
M-AILABS US English Corpus | American English | 102 hours and 7 minutes | http://www.caito.de/data/Training/stt_tts/en_US.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) | |
M-AILABS Spanish Corpus | Spanish Spanish | 108 hours and 34 minutes | http://www.caito.de/data/Training/stt_tts/es_ES.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) | |
M-AILABS Italian Corpus | Italian | 127 hours and 40 minutes | http://www.caito.de/data/Training/stt_tts/it_IT.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) | |
M-AILABS Ukrainian Corpus | Ukrainian | 87 hours and 8 minutes | http://www.caito.de/data/Training/stt_tts/uk_UK.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) | |
M-AILABS Russian Corpus | Russian | 46 hours and 47 minutes | http://www.caito.de/data/Training/stt_tts/ru_RU.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) | |
M-AILABS French-v0.9 Corpus | French | 190 hours and 30 minutes | http://www.caito.de/data/Training/stt_tts/fr_FR.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) | |
M-AILABS Polish Corpus | Polish | 53 hours and 50 minutes | http://www.caito.de/data/Training/stt_tts/pl_PL.tgz | M-AILABS LICENSE (a data-specific BSD 3-Clause License) |