zindi_mcv_swahilli

public word error rate: 0.114294524
private word error rate: 0.112809661

How I used Seamless m4t large to get to the top 5 of the mozilla common voice competition. I only downloaded the test.tar.gz directory later I unzipped it and resampled all the audio to 16KHz. I noticed that there was some audio that was muffled, and was pretty bad as is due to the sampling rates that were set. Anyways, the script I used to do the conversion is called prepare_files.sh. Follow the instructions to install seamless m4t large. I performed inference on each audio file python asr.py the output was then saved to asr_results.csv then it was formatted to a certain format needed for Zindi with python clean_submission.py.

You can do all this in one step

make run

Lesson

Review huggingface leaderboard for the ASR models. Look for one with the fastest and the most accurate.

leaderboard

Facebook/meta have a lot of Speech to text models. Look for one that is capable of doing Speech to text. The ones that primarily do one thing seem to be the best.

Shuyib/zindi_mcv_swahilli

zindi_mcv_swahilli

You can do all this in one step

Lesson