Streaming transcriber with whisper. Enough machine power is needed to transcribe in real time.
pip install -U git+https://github.com/shirayu/whispering.git
# If you use GPU, install proper torch and torchaudio
# Check https://pytorch.org/get-started/locally/
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
# Run in English
whispering --language en --model tiny
--help
shows full options--model
set the model name to use. Larger models will be more accurate, but may not be able to transcribe in real time.--language
sets the language to transcribe. The list of languages are shown withwhispering -h
--no-progress
disables the progress message-t
sets temperatures to decode. You can set several like-t 0.0 -t 0.1 -t 0.5
, but too many temperatures exhaust decoding time--debug
outputs logs for debug--no-vad
disables VAD (Voice Activity Detection). This forces whisper to analyze non-voice activity sound period--output
sets output file (Default: Standard output)
Without --allow-padding
, whispering just performs VAD for the period,
and when it is predicted as "silence", it will not be passed to whisper.
If you want to change the VAD interval, change -n
.
If you want quick response, set small -n
and add --allow-padding
.
However, this may sacrifice the accuracy.
whispering --language en --model tiny -n 20 --allow-padding
⚠ No security mechanism. Please make secure with your responsibility.
Run with --host
and --port
.
whispering --language en --model tiny --host 0.0.0.0 --port 8000
whispering --host ADDRESS_OF_HOST --port 8000 --mode client
You can set -n
, --allow-padding
and other options.
If you get OSError: PortAudio library not found
: Install portaudio
# Ubuntu
sudo apt-get install portaudio19-dev
- MIT License
- Some codes are ported from the original whisper. Its license is also MIT License