A Rhasspy profile for Persian (fa
).
Trained from approximately 293 hours of audio from Common Voice (Persian 7.0 dataset, validated, 10% test).
Available Vosk models:
- Small nnet3
- WER: 15.57%
- Large nnet3
- WER: 13.58%
Get started by first installing Vosk:
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
# Install Vosk
pip3 install vosk
Next, download the model and extract it:
wget 'https://github.com/rhasspy/fa_kaldi-rhasspy/releases/download/v1.0/vosk-model-small-fa-rhasspy-0.15.zip'
unzip vosk-model-small-fa-rhasspy-0.15.zip
Finally, run the transcribe.py
Python program with the model and an audio file:
python3 transcribe.py vosk-model-small-fa-rhasspy-0.15 welcome.wav
{"result": [{"conf": 1.0, "end": 0.48, "start": 0.06, "word": "خوش"}, {"conf": 1.0, "end": 1.11, "start": 0.48, "word": "آمدید"}], "text": "خوش آمدید"}
For each audio file given to transcribe.py
, a line of JSON will be printed in the output with the transcription details.
if you face this error:
ModuleNotFoundError: No module named "numpy"
check this issue
if you face this error:
ModuleNotFoundError: No module named "librosa"
check this issue
if you face this error:
/your/path/.venv/lib/python3.9/site-packages/librosa/util/decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead.
return f(*args, **kwargs)
check this issue
If you got any other error feel free to open an new issue here and describe your problem specific.