Whisper ASR Box

Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.

Features

Current release (v1.8.2) supports following whisper models:

Quick Usage

CPU

docker run -d -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest

GPU

docker run -d --gpus all -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest-gpu

Cache

To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:

docker run -d -p 9000:9000 \
  -v $PWD/cache:/root/.cache/ \
  onerahmet/openai-whisper-asr-webservice:latest

Key Features

Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
Multiple output formats (text, JSON, VTT, SRT, TSV)
Word-level timestamps support
Voice activity detection (VAD) filtering
Speaker diarization (with WhisperX)
FFmpeg integration for broad audio/video format support
GPU acceleration support
Configurable model loading/unloading
REST API with Swagger documentation

Environment Variables

Key configuration options:

ASR_ENGINE: Engine selection (openai_whisper, faster_whisper, whisperx)
ASR_MODEL: Model selection (tiny, base, small, medium, large-v3, etc.)
ASR_MODEL_PATH: Custom path to store/load models
ASR_DEVICE: Device selection (cuda, cpu)
MODEL_IDLE_TIMEOUT: Timeout for model unloading

Documentation

For complete documentation, visit: https://ahmetoner.github.io/whisper-asr-webservice

Development

# Install poetry
pip3 install poetry

# Install dependencies
poetry install

# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000

After starting the service, visit http://localhost:9000 or http://0.0.0.0:9000 in your browser to access the Swagger UI documentation and try out the API endpoints.

Credits

This software uses libraries from the FFmpeg project under the LGPLv2.1

ahmetoner/whisper-asr-webservice