/whisper-web

ML-powered speech recognition directly in your browser

Primary LanguageTypeScriptMIT LicenseMIT

Whisper Web (w/webGPU)

Forked to add a few conveniences:

  • use webGPU for transcription by default
  • use the whisper-large-v3-turbo model by default
  • name downloaded transcript with the same name as the input file (instead of transcript.txt)
  • automatically download a text file transcript when transcription is complete
  • downloaded text files will have a timestamp for each transcribed chunk of audio data (instead of a massive single paragraph of text)
  • verify that the models get cached
  • add drag and drop for input file uploads
  • add more performance metrics for transcription
  • refactor UI components to make them easier (imho) to understand, a-la clean code's recommendations for functions

Running locally

  1. Clone the repo and install dependencies:

    git clone https://github.com/shola/whisper-web.git
    cd whisper-web
    pnpm install  #optional, `npm` will work just fine
  2. Run the development server:

    pnpm run dev

    Firefox users need to change the dom.workers.modules.enabled setting in about:config to true to enable Web Workers. Check out this issue for more details.

  3. Open the link (e.g., http://localhost:5173/) in your browser.