/common-voice-stats

A living document for all things Common Voice.

Common Voice

This is a living document on all things related to the Common Voice project.

Feel free to make suggestions!

Licensing

Common Voice is released under a Creative Commons-0 license.

Download

You can download the current release here: Common Voice Download

Current Release

Release Date

December 10, 2019

Language Statistics

LANGUAGE # HOURS # SPEAKERS LANGUAGE FAMILY
English 1,118 hours (validated); 1,488 hours (total) 51,072 speakers (reported: 13% female / 46% male) Indo-European
German 483 hours (validated); 538 hours (total) 8,460 speakers (reported: 9% female / 67% male) Indo-European
French 350 hours (validated); 412 hours (total) 8,164 speakers (reported: 12% female / 65% male) Indo-European
Welsh 59 hours (validated); 77 hours (total) 1,149 speakers (reported: 18% female / 29% male) Indo-European
Breton 5 hours (validated); 12 hours (total) 133 speakers (reported: 2% female / 55% male) Indo-European
Chuvash <1 hour (validated); 2 hours (total) 38 speakers (reported: 0% female / 47% male) Turkic
Turkish 13 hours (validated); 14 hours (total) 461 speakers (reported: 8% female / 74% male) Turkic
Tatar 25 hours (validated); 27 hours (total) 142 speakers (reported: 2% female / 81% male) Turkic
Kyrgyz 11 hours (validated); 21 hours (total) 119 speakers (reported: 44% female / 45% male) Turkic
Irish 2 hour (validated); 4 hour (total) 80 speakers (reported: 16% female / 59% male) Indo-European
Kabyle 262 hours (validated); 276 hours (total) 693 speakers (reported: 22% female / 55% male) Afro-Asiatic
Catalan 245 hours (validated); 295 hours (total) 3,724 speakers (reported: 35% female / 43% male) Indo-European
Taiwanese Mandarin 42 hours (validated); 60 hours (total) 1,108 speakers (reported: 26% female / 48% male) Sino-Tibetan
Slovenian 3 hour (validated); 6 hours (total) 51 speakers (reported: 16% female / 80% male) Indo-European
Italian 85 hours (validated); 122 hours (total) 4,292 speakers (reported: 18% female / 47% male) Indo-European
Dutch 24 hours (validated); 33 hours (total) 701 speakers (reported: 10% female / 66% male) Indo-European
Hakha Chin 2 hours (validated); 5 hours (total) 290 speakers (reported: 20% female / 23% male) Sino-Tibetan
Esperanto 35 hours (validated); 41 hours (total) 215 speakers (reported: 7% female / 70% male) Indo-European
Estonian 10 hours (validated); 13 hours (total) 230 speakers (reported: 38% female / 57% male) Uralic
Persian 211 hours (validated); 255 hours (total) 2,763 speakers (reported: 6% female / 78% male) Indo-European
Basque 65 hours (validated); 99 hours (total) 638 speakers (reported: 23% female / 51% male) Language Isolate
Spanish 167 hours (validated); 221 hours (total) 8,252 speakers (reported: 10% female / 55% male) Indo-European
Mandarin (China) 26 hours (validated); 31 hours (total) 963 speakers (reported: 10% female / 64% male) Sino-Tibetan
Mongolian 9 hours (validated); 12 hours (total) 296 speakers (reported: 25% female / 36% male) Mongolic
Sakha 3 hours (validated); 6 hours (total) 37 speakers (reported: 10% female / 54% male) Turkic
Dhivehi 6 hours (validated); 8 hours (total) 101 speakers (reported: 64% female / 28% male) Indo-European
Kinyarwanda <1 hours (validated); 17 hours (total) 129 speakers (reported: 8% female / 41% male) Niger-Congo
Swedish 5 hours (validated); 6 hours (total) 99 speakers (reported: 8% female / 74% male) Indo-European
Russian 72 hours (validated); 76 hours (total) 496 speakers (reported: 23% female / 71% male) Indo-European
Indonesian 3 hours (validated); 3 hours (total) 56 speakers (reported: 4% female / 82% male) Austronesian
Arabic 7 ours (validated); 12 hours (total) 228 speakers (reported: 24% female / 48% male) Afro-Asiatic
Tamil 3 hours (validated); 4 hours (total) 91 speakers (reported: 10% female / 67% male) Dravidian
Interlingua 1 hours (validated); 3 hours (total) 12 speakers (reported: 2% female / 94% male) Indo-European
Portuguese 27 hours (validated); 29 hours (total) 354 speakers (reported: 2% female / 89% male) Indo-European
Latvian 4 hours (validated); 6 hours (total) 86 speakers (reported: 17% female / 64% male) Indo-European
Japanese 3 hours (validated); 3 hours (total) 52 speakers (reported: 0% female / 81% male) Japonic
Votic <1 hours (validated); <1 hours (total) 2 speakers (reported: 0% female / 0% male) Uralic
Abkhaz <1 hours (validated); <1 hours (total) 3 speakers (reported: 2% female / 98% male) Northwest Caucasian
Chinese (Hong Kong) <1 hours (validated); <1 hours (total) 15 speakers (reported: 24% female / 37% male) Sino-Tibetan
Romansh Sursilvan <1 hours (validated); <1 hours (total) 3 speakers (reported: 0% female / 75% male) Indo-European

Single-digit numbers + yes + no

WARNING: these words, numbers, and spellings are not guaranteed to be correct.

Digits

How would these numbers be read if you were counting out loud or if you were reading a number out loud digit-by-digit?

How would the yes/no be said if you were answering a simple question?

LANGUAGE 0 1 2 3 4 5 6 7 8 9 yes no native speaker verified?
English zero one two three four five six seven eight nine yes no YES
German null eins zwei drei vier fünf sechs sieben acht neun ja nein YES
French zéro un deux trois quatre cinq six sept huit neuf oui non YES
Welsh un dau tri pedwar pump chwech saith wyth naw NO
Breton mann unan daou tri pevar pemp c'hwec'h seizh eizh nav ya nann NO
Chuvash NO
Turkish sıfır bir iki üç dört beş altı yedi sekiz dokuz evet hayır YES
Tatar ноль бер ике өч дүрт биш алты җиде сигез тугыз әйе юк YES
Kyrgyz нөл бир эки үч төрт беш алты жети сегиз тогуз ооба жок NO
Irish a náid a haon a dó a trí a ceathair a cúig a sé a seacht a hocht a naoi NO
Kabyle NO
Catalan zero un dos tres quatre cinc sis set vuit nou no NO
Taiwanese Mandarin NO
Slovenian nìč êna dvé trí štíri pét šést sédem ósem devét ja ne NO
Italian zero uno due tre quattro cinque sei sette otto nove no NO
Dutch nul één twee drie vier vijf zes zeven acht negen ja nee YES
Hakha Chin NO
Esperanto nulo unu du tri kvar kvin ses sep ok naŭ jes ne NO
Estonian null üks kaks kolm neli viis kuus seitse kaheksa üheksa jah ei NO
Persian صفر یکی دو سه چهار پنج شش هفت هشت نه آره نه NO
Basque zero bat bi hiru lau bost sei zazpi zortzi bederatzi bai ez NO
Spanish cero uno dos tres cuatro cinco seis siete ocho nueve no YES
Mandarin (China) 没有 NO
Mongolian тэг нэг нь хоёр гурав дөрөв тав зургаа долоо найм ес тийм шүү үгүй шүү NO
Sakha NO
Dhivehi އާ ނޫން NO
Kinyarwanda zeru rimwe kabiri gatatu kane gautanu gatandatu umunane icyenda NO
Swedish noll ett två tre fyra fem sex sju åtta nio ja nej NO
Russian ноль один два три четыре пять шесть семь восемь девять да нет YES
Indonesian nol satu dua tiga empat lima enam tujuh delapan sembilan iya tidak NO
Arabic صفر واحد اثنان ثلاثة أربعة خمسة ستة سبعة ثمانية تسع نعم لا NO
Tamil பூஜ்யம் ஒன்று இரண்டு மூன்று நான்கு ஐந்து ஆறு ஏழு எட்டு ஒன்பது ஆம் இல்லை YES
Interlingua NO
Portuguese zero um dois três quatro cinco seis[ptr-br also use "meia"] sete oito nove sim não YES
Latvian nulle viens divi trīs četri pieci seši septiņi astoņi deviņi NO
Japanese ゼロ 一つ セブン はい 番号 NO
Votic NO
Abkhaz NO
Chinese (Hong Kong) NO
Romansh Sursilvan NO
Polish zero jeden dwa trzy cztery pięć sześć siedem osiem dziewięć tak nie YES