
[Feature] 语音输入和输出支持

zpng opened this issue · 23 comments

zpng commented

Title: [Feature]

Supports speech-to-text and text-to-speech functions.

Similar to the function in the following person's project:

先支持了基于 OpenAI 的 TTS 功能,语音输入后面空了再加。

In addition to page settings, do I need to configure other places?

确定一下是否可以使用 openai 的 tts 模型

Determine whether you can use openai’s tts model

I manually called the API and found that the tts model could not be used normally. Thank you for your reply. Thank you.

zpng commented

@Hk-Gosuto 大佬,语音输入的最终的效果演示图是什么样的?需要key支持什么模型?readme上写的需要https访问是指网址需要https域名吗,这个的原因是?

@Hk-Gosuto Sir, what is the final effect of voice input? What model does the key need to support? The readme that requires https access means that the website requires an https domain name. What is the reason for this?

具体技术使用的是 SpeechRecognition API 不需要设置 key,关于浏览器兼容性可以参考:
SpeechRecognition API 在大多数浏览器中要求使用HTTPS才能正常工作。


zpng commented

设置里开启后,发送按钮会变成语音输入,点击后开始说话,说完再点停止就行。 具体技术使用的是 SpeechRecognition API 不需要设置 key,关于浏览器兼容性可以参考: SpeechRecognition API 在大多数浏览器中要求使用HTTPS才能正常工作。

image image


After turning it on in the settings, the send button will turn into voice input. Click to start speaking, and then click to stop after speaking. The specific technology uses the SpeechRecognition API. There is no need to set a key. For browser compatibility, please refer to: The SpeechRecognition API is required in most browsers. Use HTTPS to work properly.

![image]( RaoZbbgZqSx-Rg1wnBXHo_K_9u1Ly3iW_SAmAQ) ![image]( WND1A2cNPs_KR4yVQ6Ya8l4AWOvE)

Why is this not achieved by calling OpenAI’s API?

这个不收费,识别效果也挺好的,为啥要用 wishper?

There is no charge for this, and the recognition effect is pretty good. Why use wishper?

zpng commented


Oh ok

可以多试一些场景,如果复杂场景效果不好的话,后面会考虑增加 wishper 适配。

You can try more scenes. If the effect of complex scenes is not good, we will consider adding wishper adaptation later.

当使用openai tts时,每一次让它说,都会重新申请一次tts请求,能不能第一次就把语音下载到本地,过后重新听就不浪费请求了

When using openai tts, every time it is asked to speak, it will re-apply for a tts request. Can the voice be downloaded to the local for the first time, so that it can be listened to again later without wasting the request?

我看看能不能把音频丢 indexedDB 里,可以先切换到 edge tts 那个不产生费用。

I'll see if I can throw the audio into indexedDB. I can switch to edge tts first which doesn't incur any charges.