This repository is a collection of resources and models for multilingual speech topics.
- corpus: list of available corpus
- model: API or ready-to-use model
- recipe: espnet recipes
- tools: other relevent tools such as g2p
Each directory is organized by languages where each language is specified by the ISO639-3 Id.
- If you find any relevant speech resources (e.g: corpus, model, recipe), you can edit the corresponding file under
data/lang/<your lang>
- If there are no existing file, you can create a file following the style in the English directory
- Once your pull request is merged, it will be automatically integrated into our website
Our web interface is based on the mkdocs framework and its theme mkdocs-material
You need to first install those software
pip install mkdocs-material
Then build the docs and serve it
python mkbuild/build_docs.py
mkdocs serve
The most common language id are as follows:
ISO id | Language |
---|---|
aar | Afar |
amh | Amharic |
ara | Literary Arabic |
aze | Azerbaijani |
ben | Bengali |
cat | Catalan |
ceb | Cebuano |
cmn | Mandarin |
ckb | Sorani |
deu | German |
eng | English‡ |
fas | Farsi |
fra | French |
hau | Hausa |
hin | Hindi |
hun | Hungarian |
ilo | Ilocano |
ind | Indonesian |
ita | Italian |
jav | Javanese |
kaz | Kazakh |
kin | Kinyarwanda |
kir | Kyrgyz |
kmr | Kurmanji |
lao | Lao |
mal | Malayalam |
mar | Marathi |
mlt | Maltese |
mya | Burmese |
msa | Malay |
nld | Dutch |
nya | Chichewa |
orm | Oromo |
pan | Punjabi |
pol | Polish |
por | Portuguese |
ron | Romanian |
rus | Russian |
sna | Shona |
som | Somali |
spa | Spanish |
swa | Swahili |
swe | Swedish |
tam | Tamil |
tel | Telugu |
tgk | Tajik |
tgl | Tagalog |
tha | Thai |
tir | Tigrinya |
tpi | Tok Pisin |
tuk | Turkmen |
tur | Turkish |
ukr | Ukranian |
uig | Uyghur |
uzb | Uzbek |
vie | Vietnamese |
xho | Xhosa |
yor | Yoruba |
zul | Zulu |