/cmu_multilingual_speech

CMU multilingual speech repository

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

cmu_multilingual_speech

This repository is a collection of resources and models for multilingual speech topics.

  • corpus: list of available corpus
  • model: API or ready-to-use model
  • recipe: espnet recipes
  • tools: other relevent tools such as g2p

Each directory is organized by languages where each language is specified by the ISO639-3 Id.

How to contribute

  • If you find any relevant speech resources (e.g: corpus, model, recipe), you can edit the corresponding file under data/lang/<your lang>
  • If there are no existing file, you can create a file following the style in the English directory
  • Once your pull request is merged, it will be automatically integrated into our website

How to build the web interface locally

Our web interface is based on the mkdocs framework and its theme mkdocs-material

You need to first install those software

pip install mkdocs-material

Then build the docs and serve it

python mkbuild/build_docs.py
mkdocs serve

Common Language Id

The most common language id are as follows:

ISO id Language
aar Afar
amh Amharic
ara Literary Arabic
aze Azerbaijani
ben Bengali
cat Catalan
ceb Cebuano
cmn Mandarin
ckb Sorani
deu German
eng English‡
fas Farsi
fra French
hau Hausa
hin Hindi
hun Hungarian
ilo Ilocano
ind Indonesian
ita Italian
jav Javanese
kaz Kazakh
kin Kinyarwanda
kir Kyrgyz
kmr Kurmanji
lao Lao
mal Malayalam
mar Marathi
mlt Maltese
mya Burmese
msa Malay
nld Dutch
nya Chichewa
orm Oromo
pan Punjabi
pol Polish
por Portuguese
ron Romanian
rus Russian
sna Shona
som Somali
spa Spanish
swa Swahili
swe Swedish
tam Tamil
tel Telugu
tgk Tajik
tgl Tagalog
tha Thai
tir Tigrinya
tpi Tok Pisin
tuk Turkmen
tur Turkish
ukr Ukranian
uig Uyghur
uzb Uzbek
vie Vietnamese
xho Xhosa
yor Yoruba
zul Zulu