A Gradio-based browser interface for Whisper. You can use it as an Easy Subtitle Generator!
If you wish to try this on Colab, you can do it in here!
- Select the Whisper implementation you want to use between :
- openai/whisper
- SYSTRAN/faster-whisper (used by default)
- Vaibhavs10/insanely-fast-whisper
- Generate subtitles from various sources, including :
- Files
- Youtube
- Microphone
- Currently supported subtitle formats :
- SRT
- WebVTT
- txt ( only text file without timeline )
- Speech to Text Translation
- From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
- Text to Text Translation
- Translate subtitle files using Facebook NLLB models
- Translate subtitle files using DeepL API
- Pre-processing audio input with Silero VAD.
- Pre-processing audio input to separate BGM with UVR.
- Post-processing with speaker diarization using the pyannote model.
- To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
The app is able to run with Pinokio.
- Install Pinokio Software.
- Open the software and search for Whisper-WebUI and install it.
- Start the Whisper-WebUI and connect to the
http://localhost:7860.
-
Install and launch Docker-Desktop.
-
Git clone the repository
git clone https://github.com/jhj0517/Whisper-WebUI.git- Build the image ( Image is about 7GB~ )
docker compose build - Run the container
docker compose up- Connect to the WebUI with your browser at
http://localhost:7860
If needed, update the docker-compose.yaml to match your environment.
To run this WebUI, you need to have git, 3.10 <= python <= 3.12, FFmpeg.
Edit --extra-index-url in the requirements.txt to match your device.
By default, the WebUI assumes you're using an Nvidia GPU and CUDA 12.4. If you're using Intel or another CUDA version, read the requirements.txt and edit --extra-index-url.
Please follow the links below to install the necessary software:
- git : https://git-scm.com/downloads
- python : https://www.python.org/downloads/
3.10 ~ 3.12is recommended. - FFmpeg : https://ffmpeg.org/download.html
- CUDA : https://developer.nvidia.com/cuda-downloads
After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH!
- git clone this repository
git clone https://github.com/jhj0517/Whisper-WebUI.git- Run
install.batorinstall.shto install dependencies. (It will create avenvdirectory and install dependencies there.) - Start WebUI with
start-webui.batorstart-webui.sh(It will runpython app.pyafter activating the venv)
And you can also run the project with command line arguments if you like to, see wiki for a guide to arguments.
This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.
According to faster-whisper, the efficiency of the optimized whisper model is as follows:
| Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
|---|---|---|---|---|---|
| openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
| faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
If you want to use an implementation other than faster-whisper, use --whisper_type arg and the repository name.
Read wiki for more info about CLI args.
If you want to use a fine-tuned model, manually place the models in models/Whisper/ corresponding to the implementation.
Alternatively, if you enter the huggingface repo id (e.g, deepdml/faster-whisper-large-v3-turbo-ct2) in the "Model" dropdown, it will be automatically downloaded in the directory.
If you're interested in deploying this app as a REST API, please check out /backend.
- Add DeepL API translation
- Add NLLB Model translation
- Integrate with faster-whisper
- Integrate with insanely-fast-whisper
- Integrate with whisperX ( Only speaker diarization part )
- Add background music separation pre-processing with UVR
- Add fast api script
- Add CLI usages
- Support real-time transcription for microphone
Any PRs that translate the language into translation.yaml would be greatly appreciated!


