A simple audio transcription tool that captures audio input, saves it into WAV format, and transcribes it using whisper.cpp.
This is a personal project and runs on an Orange Pi 5B, running Armbian 23.08, that's attached to a Focusrite Scarlett 6i6. You might need to make configuration changes if you want to use it in a different environment.
Include the path to the whisper.cpp base model as an argument:
sound_sense -m /home/alexwoolford/whisper.cpp/models/ggml-base.en.bin
The executable transcribes audio to timestamped JSON files:
{
"file_name": "audio_20230824011133.wav",
"transcriptions": [
{
"start": "2023-08-24T01:11:33.900+00:00",
"end": "2023-08-24T01:11:34+00:00",
"text": "Blessed be the fruit."
},
{
"start": "2023-08-24T01:11:34+00:00",
"end": "2023-08-24T01:11:34.100+00:00",
"text": "May the Lord open."
},
{
...
{
]
}
It's possible to parse out the text from the metadata with the following jq command:
jq '.transcriptions[].text' audio_*.json