This project allows you to record audio and transcribe it using ffmpeg
and whisper on an Ubuntu distrobution. It captures any audio played through your soundcard and converts it to text.
The intent is to create or add to a workflow that allows you to capture audio and convert to text/notes. For example, I might use fabric to analyse and structure the content.
- Ubuntu (tested on Ubuntu 20.04)
ffmpeg
pulseaudio-utils
python3
pip
whisper
(OpenAI's Whisper)
Update your package list and install ffmpeg
and pulseaudio-utils
:
sudo apt update
sudo apt install ffmpeg pulseaudio-utils
Install pip
and Whisper:
sudo apt install python3-pip
pip3 install git+https://github.com/openai/whisper.git
List the available audio sources to identify the correct one:
pactl list short sources
Look for the entry corresponding to the monitor of your output device (e.g., alsa_output.pci-0000_00_1b.0.analog-stereo.monitor
).
Replace YOUR_AUDIO_SOURCE
with the actual audio source you identified earlier.
- **Start **: Begin audio source (zoom meeting, podcast, etc).
- Run the Script: Execute the script to start recording and transcribing:
./record_transcribe.sh
The script will record the audio for 1 hour and then transcribe it to text. Otherwise, ^ c will break the recording and whisper will start transcribing and the script will save to a txt file in $HOME/recording
- Ensure that the audio source is playing out of correct output device.
- Adjust the recording duration by changing the value of the
-t
parameter in theffmpeg
command in the bash script. - Transcribing audio on the CPU can be slow. Ensure your system can handle the load.
This project is licensed under the MIT License.