This is an ongoing work on Unsupervised Neural Machine Translation for Fula-English. Neural Machine Translation (NMT) has shown significant success lately, but most of the success is achieved for the world's most popular languages like English and French. Despite making up a large percentage of the world's living languages today, African languages lack good quality text corpora and this hinders machine translation research.
I decided to work on Unsupervised Neural Machine Translation (UNMT) for Fula. Fula is a member of the Senegambian branch of the Niger-Congo language family spoken by about 18 million people across about 20 countries in West and Central Africa. Fula is spoken as a set of various dialects. The language is known by other names: Fulah, Ful, Fulfulde, Fulani, Fulbe, Peul, Pular, and Pulaar.
We used Scrapy to crawl a number of websites to get monolingual data for the Fula language. These websites are:
- Pulaar.org - A multimedia news and advocacy website in Pulaar (Mauritania)
- Binndi Pulaar - A news website on culture, history, language, and politics (Senegal)
- Dingiral Fulbe (Senegal)
- Teddungal Damal Pulaaku (Guinea)
- Hammadi-Jah (Science Blog - Mauritania)
- Pular New Testament Bible (Guinea)
The code for all the crawlers are in the crawlers
folder of this project.