poetry config virtualenvs.in-project true
poetry install
We shall use whisper.cpp
for better CPU inference performance & so that we don't need to waste hard disk space installing torch
.
# clone whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
# download ggml binary (contains tokenizer, model weights and everything else you need)
# warning: large file!
bash ./whisper.cpp/models/download-ggml-model.sh large-v2
# compile (you need to have gcc (c++) installed) the `main` binary
make
# inference on
./main -f ../data/Task_3/out_16kHz.wav -m models/ggml-large-v2.bin
Task 3
Whisper requires the input .wav
files to be single channel with a 16kHz sampling rate. Check this.
If you are using the openai-whisper
Python library, the .transcribe
method already takes care of this conversion for you.
For whisper.cpp
, first export the cleaned & normalized audio clip as is.
import soundfile as sf
sf.write("../data/Task_3/out.wav", data=new_sig, samplerate=sr)
Then you can use ffmpeg
to downsample the clip.
ffmpeg -i out.wav -acodec pcm_s16le -ar 16000 -ac 1 out_16kHz.wav