Sound Sense

A simple audio transcription tool that captures audio input, saves it into WAV format, and transcribes it using whisper.cpp.

This is a personal project and runs on an Orange Pi 5B, running Armbian 23.08, that's attached to a Focusrite Scarlett 6i6. You might need to make configuration changes if you want to use it in a different environment.

Include the path to the whisper.cpp base model as an argument:

sound_sense -m /home/alexwoolford/whisper.cpp/models/ggml-base.en.bin

The executable transcribes audio to timestamped JSON files:

{
    "file_name": "audio_20230824011133.wav",
    "transcriptions": [
        {
            "start": "2023-08-24T01:11:33.900+00:00",
            "end": "2023-08-24T01:11:34+00:00",
            "text": "Blessed be the fruit."
        },
        {
            "start": "2023-08-24T01:11:34+00:00",
            "end": "2023-08-24T01:11:34.100+00:00",
            "text": "May the Lord open."
        },
        {
            ...
        {
    ]
}

It's possible to parse out the text from the metadata with the following jq command:

jq '.transcriptions[].text' audio_*.json