AfricanVoices

AfricanVoices is a project that aims to increase the research in speech synthesis for African languages by creating and collecting high quality speech datasets for African Languages. We also avail the synthesizers that we have built for others to use. AfricanVoices is a collaborative project, where we welcome anyone (see the contribution section) to contribute in advancing this project.
To learn more about the project, you can read the AfricanVoices paper.

The project is currently in its early stages. We have worked on the following languages:

Language	Langcode	Source	No. utterances	Hrs
Luo	luo	Open.Bible	11263	15.92
Lingala	lin	Open.Bible	12957	27.52
Kikuyu	kik	Open.Bible	10877	17.72
Yoruba	yor	Open.Bible	10978	18.04
Hausa-M	hau	CommonVoice	518	0.62
Hausa-F	hau	CommonVoice	1938	2.3
Luganda	lug	CommonVoice	2942	4.52
Ibibio	ibb	LLSTI	125	0.32
Kiswahili	swa	LLSTI	426	0.53
Wolof	wol	ALFFA	1000	1.2
Fongbe	fon	ALFFA	542	0.33
Suba	sxb	Bible.is	11971	24.82
Suba	sxb	AfricanVoices	1178	1.7
Luo	luo	AfricanVoices	1516	1.79
English (Kenyan)	en-ke	AfricanVoices	593	0.74

AfricanVoices website

Find more info and download data and synthesizers from the AfricanVoices website.

Data sources

Besides creating the data, we also use available data from other sources so that we can have a wide coverage. We have used the following resources so far :

Open.Bible: Avails Bible recordings in form of audiobooks. We used the methodology in CMU-wilderness to align the corpus. The code for aligning can be found in Aligning AudioBooks
Faith comes by hearing: Also known as Bible.is.They avail Bible recordings in the form of audiobooks. Used the same procedure as Open.Bible. We do used this for Suba but we cannot release the data as the license doesn't allow. You are encouraged to use them for your personal experimenrts.
LLSTI: The Local Language Speech Technology Initiative project developed TTS datasets for localization of speech technology. We obtained Ibibio and Kiswahili by converting the publicly distributed lpc and res files to wav using Festvox tools.
Mozilla CommonVoice : We selected data from a single speaker with the most utterances for Luganda and Hausa.
ALFFA: ALFFA project [1] developed TTS and ASR technologies and data for Kiswahili, Fongbe, Wolof and Amharic. We selected a single speaker subset of the data for each language.

We appreciate all the creators of the above resources.

Number dictionaries

This is a resource we created to be used in normalizing numbers, an important step in both aligning and building TTS systems. Find it here.

Developing your own dataset

Aligning audiobooks

To align audiobooks/long audio files, follow the guidelines in the Aligning AudioBooks section.

Recording

There are guidelines in the Creating Data section of NewLangTech.

Building a speech synthesizer

We have developed for building s speech synthesizer here.

Contributing

We highly welcome contributions through issues or pull requests. Here are some contribution ideas:

Provide a dataset for a language; either you recorded yourself or from other sources(as long as the license allows).
Provide a synthesizer that you have trained.
Provide a number dictionary for a language.
Correct a mistake in the current number dictionaries or add alternatives.
Help in improving the documentation or code
Evaluate output of a synthesizer.

License

The dataset created by AfricanVoices (Kenyan English, Suba and Luo) is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Usage for individual research and open source projects

For individual research purposes and projects that are open source, you are free to use the dataset without any restrictions. Feel free to explore, analyze, and incorporate the data into your work, as long as you comply with the terms of the license.

Commercial use and licensing

If you are interested in using the dataset for commercial purposes, please reach out to us to obtain a separate license. We are open to discussing licensing options that meet your specific needs. Contact us via aogayo@andrew.cmu.edu to initiate the licensing process.

Third-Party datasets

Please note that the datasets that we developed from other sources listed in Data sources are released under their original licenses. It is essential to refer to the original data sources and their respective licenses for proper attribution and compliance. We recommend reviewing the individual dataset files or documentation for details on the original sources and licenses.

We encourage collaboration and innovation, and we believe that by working together, we can create amazing things. Thank you for your interest in our dataset, and we look forward to hearing about the exciting ways it contributes to your research and projects!

References

[1] E. Gauthier, L. Besacier, S. Voisin, M. Melese, and U. P. Elingui, “Collecting resources in sub-Saharan African languages for automatic speech recognition: a case study of Wolof,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Portorož, Slovenia: European Language Resources Association (ELRA), May 2016, pp. 3863–3867. [Online]. Available: https://aclanthology.org/ L16-1611

neulab/AfricanVoices