Whisper UI

The main goal of the project is to create a graphical user interface for the Whisper speech recognition library. GUI created using the Flet library

The project supports two recognition modes:

  • single file process - recognition of only one file
  • batch process - processing multiple files at once

You can select a mode by clicking on the popup menu button on the main tab: change_mode

Single file mode

You need to select an audio file, select a recognition model and click the Recognize button. The recognition process is displayed in real time and you can copy part of the result. look at the example below: single_tab

Batch process

You need select an output folder, audio files (one or more), select recognition model. You can remove concrete selected file from audio list or clear all files select_files

Take a look at the process of recognizing multiple files:

recognition_batch

Requirements

Whisper requires FFMpeg to be installed

Remarks

  • By default the model is downloaded to the following directory "~/.cache/whisper". You can't change this value via ui
  • The output tab does not display information about the model download process.The beginning and end of the model download process is indicated as follows:
    Model download directory: ~./cache/whisper      #download model started
    Model {model_name} downloaded                   #download model finished
    
  • In Linux, the FilePicker control depends on Zenity when running Flet as an app. This is not a requirement when running Flet in a browser. To install Zenity on Ubuntu/Debian run the following commands:
    sudo apt-get install zenity
    

Links

  • Whisper - is a general-purpose speech recognition model
  • Flet - the fastest way to build Flutter apps in Python