C-Nedelcu/talk-to-chatgpt

Voice-to-text not working again

Opened this issue · 12 comments

It's not putting the text in the message box to send. I received an email from OpenAI yesterday saying that the default model is changing to "gpt-4o-2024-08-06". I wouldn't think that would break it, but I guess they changed something on the page.

@hoshizorista Are you experiencing the same issue? If so, do you have any suggestions for a fix? Our weekly podcast relies on this heavily and we are missing having it. Thank you!

@StudioDweller yeeeep, working on it! sorry for not noticing github didnt informed me, im actually finishing off a rework of the extension with some autostart and cool functions (such as use base gpt voices) ill have it ready for tomorrow at late night, hang in there!

@hoshizorista We would love to have you as a guest on our podcast. Let me know if that’s something that you would be interested in. We sincerely appreciate your efforts with maintaining this extension and would love to talk to you about it.

Hi @hoshizorista also really appreciate you working on this - I've been using the extension as a way to experiment with chat gpt in performance so getting it back on line would be amazing. There isn't really a comparable tool. Also @StudioDweller would love to know more about where to find your podcast - sounds interesting.

@nowallslive I do a weekly podcast called Up Against Reality on all things AI along my co-host Chris and we leverage this extension for realtime interaction with our AI co-host/custom GPT we call RAINA. The podcast is available on most of the major podcast platforms. This episode is a good showcase of realtime interactions using this extension. Thanks for your interest!

https://upagainstreality.com/2024/03/12/rainas-20-questions/

@StudioDweller It would be an honor :), @nowallslive My pleasure!

I just released the update on my fork, added some new functions, I'm praying is not that buggy haha, please check it out and let me know if it works for you guys, https://github.com/hoshizorista/talkgpt/tree/main

Just download the extension from the latest release, decompress, make sure to uninstall all previous versions, install and enjoy! lmk if you had any issues

You guys can reefer to my fork, looks like C-Nedelu already moved on from this so I'll work on my fork to keep his work alive so it works for all of us!

@hoshizorista 👍 hello friend , First of all, thank you so much for your incredible work on the extension! I’ve really been enjoying its features, and I’m excited about the new updates you mentioned, like autostart and the base GPT voices.

I wanted to recommend a service that might interest you: Fish.audio. It’s quite similar to Eleven Labs, but much more affordable and accessible. They offer 50 free uses per day, and their API pricing for cloned voices is significantly lower compared to Eleven Labs. This service allows you to use a high-quality voice through its API at a very low cost for much longer periods, which could be a more budget-friendly option compared to Eleven Labs. Eleven Labs can get expensive quickly and is hard to use for ongoing conversations with the chatbot because it consumes too much money, making it impractical for constant use—more suited for specific, punctual moments.

That’s why I’d like to recommend Fish Audio, as I use it for my projects, and it works really well for me. Additionally, they offer an API that could be adapted to your extension.

I think it would be awesome to have the option to integrate Fish Audio into your ChatGPT extension for Text-to-Speech (TTS). The quality is excellent, the latency is super fast, and the pricing is very competitive. Here’s the link to their site: https://fish.audio/, and here is their documentation: https://docs.fish.audio/.

I really believe this would be a great addition to the extension, and I’m sure many users would appreciate it. If you're interested in exploring the integration, I’d love to support the project, and I’d be happy to make a donation to your PayPal as a token of appreciation for adding this feature. I’m confident many users would be excited about the idea as well!

Thanks for your time, and keep up the great work. Cheers!

@enrix507 Hey! sounds like a good idea! havent heard of it but if it supports streaming its very likely it can be added, gonna look into it!

@StudioDweller Hey Larry! thanks for noticing, gonna check it out!, I notice you mention its happening between "some words", is it happening under any specific words? or words from another lenguage?, The model by default is selected based on the voice (IE. Roger or Aria are under are under Multi-lenguage auto detect V.2) this is done for simplicity since its similar on how ElevenLabs tts works on the page, maybe we can do some tests changing the model to see if we have any improvement, plz let me know the name of the voice youre using so we can check it out

@hoshizorista The delays seem to happen consistently after the first 1 to 3 words and then randomly during the rest of the response. The voice I’m using is a custom voice that was generated using their “instant voice clone” and I don’t see a way to set a default model for it in my ElevenLabs account. FWIW, I noticed in the code a mention of “Eleven Turbo 2” and the current low latency model is 2.5.

Thanks so much for your help and efforts with this!