/Whisper-WebUI

A Web UI for easy subtitle using whisper model.

Primary LanguagePythonApache License 2.0Apache-2.0

Whisper-WebUI

A Gradio-based browser interface for Whisper. You can use it as an Easy Subtitle Generator!

Whisper WebUI

About large-v3

large-v3 model has newly added on 23-11-07. large-v3 can not be used in faster-whsiper now, so set DISABLE_FASTER_WHISPER to true in user-start-webui.bat file if you want to use large-v3.

Notebook

If you wish to try this on Colab, you can do it in here!

Feature

  • Generate subtitles from various sources, including :
    • Files
    • Youtube
    • Microphone
  • Currently supported subtitle formats :
    • SRT
    • WebVTT
    • txt ( only text file without timeline )
  • Speech to Text Translation
    • From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
  • Text to Text Translation
    • Translate subtitle files using Facebook NLLB models

Installation and Running

Prerequisite

To run Whisper, you need to have git, python version 3.8 ~ 3.10 and FFmpeg.

Please follow the links below to install the necessary software:

After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH!

Automatic Installation

If you have satisfied the prerequisites listed above, you are now ready to start Whisper-WebUI.

  1. Run Install.bat from Windows Explorer as a regular, non-administrator user.
  2. After installation, run the start-webui.bat. (It will automatically download the model if it is not already installed.)
  3. Open your web browser and go to http://localhost:7860

( If you're running another Web-UI, it will be hosted on a different port , such as localhost:7861, localhost:7862, and so on )

And you can also run the project with command line arguments if you like by running user-start-webui.bat, see wiki for a guide to arguments.

VRAM Usages

This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.

According to faster-whisper, the efficiency of the optimized whisper model is as follows:

Implementation Precision Beam size Time Max. GPU memory Max. CPU memory
openai/whisper fp16 5 4m30s 11325MB 9439MB
faster-whisper fp16 5 54s 4755MB 3244MB

If you want to use the original Open AI whisper implementation instead of optimized whisper, you can set the command line argument DISABLE_FASTER_WHISPER to True. See the wiki for more information.

Available models

This is Whisper's original VRAM usage table for models.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

.en models are for English only, and the cool thing is that you can use the Translate to English option from the "large" models!