savbell/whisper-writer

New features: Automatically start and stop to record. Bip sound once text is insert.

ossossosso opened this issue · 2 comments

Hi,
my name is Daniele. I'm an italian stenographer.
Basically I transcribe what I hear in text using a steno keyboard.

I've interest to better understand OpenAI Whisper and his capability.
I don't have a good hardware, so I'd like to use OpenAI Speech to Text API.

I thank you very much for your project WhisperWriter:

I tried it and it works for me.

Not a lot of application on github which use Whisper for Speech to Text allow at the same time:

  1. Use a microphone as audio source.
  2. Use Whisper API instead of local model.
  3. Transcribe directly into any text editor.

So thanks for this opportunity.

It would be interesting to have two new features:

  1. No need to press any shortcut to run record again.
    I mean, once pressed shortcut like Ctrl Shift Spacebar the first time to run recording, once the audio recording is automatically stopped and text transcribed, It would be great if I don't need to press shortcut again, but a new recording starts automatically, waiting for my words.
    I only need a new shortcut to stop recording definitively.

  2. Because I'm a blind user, would be useful a sort of "bip sound" which inform me when text is transcribed, in this case I know I can speak again.

thanks a lot.

Daniele.

Hi Daniele, thank you for your comments! I'm happy to hear that WhisperWriter has worked well for you :)

I appreciate your feature requests and I went ahead and added the option for a "beep" sound to play once the transcription has finished writing to the screen. After downloading my latest commit, you can turn the feature on by setting the noise_on_completion configuration option to true in src\config.json. If you would like to change the sound that is made, you can replace "beep.wav" in the assets folder, or change the file path on line 102 of main.py.

Although there is not currently a pipelining feature like you described with the default voice activity detection method, you can change the way the app starts and stops recording to a key toggle. If you change recording_mode in the configuration options to press_to_toggle, the app will start listening when you press the keyboard shortcut and stop listening when you press it a second time, rather than waiting for you to finish speaking.

I hope you find these changes useful! Please let me know if there are any other features that would make the app work better for you :)

Thanks a lot! It works perfectly.