Whisper-WebUI

A Gradio-based browser interface for Whisper. You can use it as an Easy Subtitle Generator!

About large-v3

large-v3 model has newly added on 23-11-07. large-v3 can not be used in faster-whsiper now, so set DISABLE_FASTER_WHISPER to true in user-start-webui.bat file if you want to use large-v3.

Notebook

If you wish to try this on Colab, you can do it in here!

Feature

Generate subtitles from various sources, including :
- Files
- Youtube
- Microphone
Currently supported subtitle formats :
- SRT
- WebVTT
- txt ( only text file without timeline )
Speech to Text Translation
- From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
Text to Text Translation
- Translate subtitle files using Facebook NLLB models

Installation and Running

Prerequisite

To run Whisper, you need to have git, python version 3.8 ~ 3.10 and FFmpeg.

Please follow the links below to install the necessary software:

git : https://git-scm.com/downloads
python : https://www.python.org/downloads/
FFmpeg : https://ffmpeg.org/download.html

After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH!

Automatic Installation

If you have satisfied the prerequisites listed above, you are now ready to start Whisper-WebUI.

Run Install.bat from Windows Explorer as a regular, non-administrator user.
After installation, run the start-webui.bat. (It will automatically download the model if it is not already installed.)
Open your web browser and go to http://localhost:7860

( If you're running another Web-UI, it will be hosted on a different port , such as localhost:7861, localhost:7862, and so on )

And you can also run the project with command line arguments if you like by running user-start-webui.bat, see wiki for a guide to arguments.

VRAM Usages

This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.

According to faster-whisper, the efficiency of the optimized whisper model is as follows:

Implementation	Precision	Beam size	Time	Max. GPU memory	Max. CPU memory
openai/whisper	fp16	5	4m30s	11325MB	9439MB
faster-whisper	fp16	5	54s	4755MB	3244MB

If you want to use the original Open AI whisper implementation instead of optimized whisper, you can set the command line argument DISABLE_FASTER_WHISPER to True. See the wiki for more information.

Available models

This is Whisper's original VRAM usage table for models.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

.en models are for English only, and the cool thing is that you can use the Translate to English option from the "large" models!

anzchy/Whisper-WebUI