This repo is meant as a guide for generating text based from audio in a wav file. This is different from extracting subtitles or more commonly known as closed captions. The vosk
module requires the input audio file to be in WAV format pcm mono. The output generates a file in json format. It is not perfect but it works.
A model is required. I've included a link to a small model which is about 36 megabytes.
- vosk
- Python no lower than 3.8
- pip
-
git clone git@github.com:c0debreaker/audio2text.git
-
cd audio2text
-
python3 -m venv venv
-
source venv/bin/activate
-
pip install vosk
-
wget http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.3.zip
-
Extract the zip file to the current folder and rename it to model
-
git clone https://github.com/alphacep/vosk-api
-
python vosk-api/python/example/test_simple.py <your-input.wav> > <any-filename-output.json>