PoC - Speech to text with browser recording - Whisper model

Python JavaScript Open Source Love Edge Firefox Google Chrome

Description of the project

This is a small proof of concept to test how we can record sound from the navigator and perform speech to text from whisper. Whisper is a multilingual speech to text model from OpenAI.

Whisper blog from openAIGithub repo of Whisper

How to run the code

Run the backend

python app.py

Run the frontend

python -m http.server 8000

Live demonstration

TODO: ADD GIF HERE

Documentation

Convert blobs into mp3: https://medium.com/jeremy-gottfrieds-tech-blog/javascript-tutorial-record-audio-and-encode-it-to-mp3-2eedcd466e78 Example for livekit : https://github.com/livekit/server-sdk-go/blob/main/examples/filesaver/main.go