espressif/esp-sr

Want to suggest a wake word? Leave your thoughts here. (AIS-1441)

feizi opened this issue · 133 comments

feizi commented

Hi all,

We're excited to offer the community more free and high-quality wake word models. Everyone has their own unique wake word preferences. Now, we're ready to regularly release some of the most popular wake words. Please let us know the wake words you want! English and Chinese are both welcome.

In the past, it was an expensive process to collect high-quality human speech data. But now, our team has developed a cost-effective way to train wake word models by using only TTS samples, which reaches 90-95% accuracy compared to models trained by human-recorded samples.

The wake word models and esp-sr have the same license and are free for commercial use. If you want a more accurate and exclusive wake word, please use our wake word customization service.

Currently, we support over 20 wake words. You can choose any one wake word to test. Starting from August 1, 2024, to get a new wake word, you'll need to meet one of these requirements:

  • If you've got an ongoing project, kindly attach the project link along with a brief overview when submitting your request.
  • Your wake word has been liked or upvoted by more than five people.

We are preparing to upgrade to a new TTS model and generate some wake word models with better performance.

The Willow team and community would love "Hey Willow". It's our domain name because we've been waiting for this.

Thank you very much for offering this option, it's very exciting!

feizi commented

The Willow team and community would love "Hey Willow". It's our domain name because we've been waiting for this.

Thank you very much for offering this option, it's very exciting!

I'm glad you like this. Since "hey" and "hi" sound pretty similar, sometimes people might not really notice the difference. So, I was thinking, maybe we could support both "hey willow" and "hi willow" for waking up the device. That way, whether you say "hey willow" or "hi willow", it'll still work. Of course, when we release the wake word model, we'll call it like "wn9_heywillow". What do you think about that?

Good idea!

My only concern would be overall reduced accuracy (wake reliability vs false wake). We've noticed quite a bit of false wake with Alexa. From what I've read the automated TTS approach has 90-95% the accuracy of the models trained on human samples. I like "two word" wake words because they tend to improve accuracy, I suspect a 100% "Hey Willow" wake word could result in equivalent or even improved accuracy with the TTS approach vs even human sample trained Alexa?

Of course we could always test this, even starting with a pure "Hey Willow" model, a pure "Hi Willow" model, and a merged model.

Thanks again for offering this!

feizi commented

Your concern may indeed happen. We will generate two words and test which model performs better.

feizi commented

"hey/hi willow" model:
Model name: wn9_heywillow_tts
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 88%

Test dataset description:
The FAR dataset: This dataset contains a total of 64 hours of audio data, which includes audio collected from the internet and audio recorded using esp32-korvo boards.
The RAR dataset: This dataset is generated by multiple commercial TTS APIs, with a total of approximately 500 samples. These data and models were not used in the training process. However, due to the differences between TTS samples and human samples, please exercise caution when referring to the test results.

AigizK commented

Guys, what you are doing is really great. We have created a smart speaker called Homai based on the esp32-s3. We trained the model ourselves, but it is resource-intensive and not so easy to integrate into the pipeline. Could you please add support for our word Homai [ho'mai]? Thank you in advance!

Hi @AigizK ,
The syllable of Homai only has two. It is difficult to reduce the probability of false triggering for monosyllabic and disyllabic phrases. We recommend selecting a 3-5 syllable phrase as the wake word.

AigizK commented

Hi @sun-xiangyu
We have already launched a project with this name, so we can't change it significantly. But can we use the variant "homa ai", where the sound 'A' is pronounced long?

We have already launched a project with this name, so we can't change it significantly. But can we use the variant "homa ai", where the sound 'A' is pronounced long?

I'm sorry that our TTS model cannot specify a syllable to extend its pronunciation at the moment. This means that we cannot generate a large number of accurate “homa ai” phrases.

Hi! Thank you for this awesome solution! We are developing a smart voice assistant called Sophia. Would it be possible to have the wake word "Hi Sophia"? This would help our user experience drastically. Thank you in advance!

Hi @PrathamG , I'm glad you like it. "Sophia" sounds like a wake word that can be used directly. I mean, maybe we don't need an extra prefix "Hi". I suggest we start with just "Sophia". If the performance is not satisfactory, then we can train another one with "hi Sophia". What do you think?

Sure, that sounds like a good plan! We can use only "Sophia" and test the performance first. Thank you

If possible, I also wanted to request the wake word "Little Sophia". We are still unsure about which wake word to use, and having both options will help us determine this via user testing.

If possible, I also wanted to request the wake word "Little Sophia". We are still unsure about which wake word to use, and having both options will help us determine this via user testing.

Now our computing resources are limited. This project can generate about two wake word models in a month. So we will choose some popular wake words. Of course, if we have some free time, "Little Sophia" is also fine.

No worries, totally understandable! Looking forward to testing out the "Sophia" wake word

"Sophia" model: wn9_sophia_tts

FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 97%

xygh commented

“小美” or “小美同学” would be a perfect choice. It will suit a lot of use case. We all want wake word like a human name.

@xygh, “小美同学” sounds good.

"Sophia" model: wn9_sophia_tts

FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 97%

Thank you! We will test it out and report the results by next week

xygh commented

@xygh, “小美同学” sounds good.

BTW, “你好小美” is also a perfect choice.

"小当家" or "Hi 小星" is preferable wake word in our scenario. Thanks a lot!

The second version "Sophia":
model info: wakenet9l_tts1h8v2_Sophia_3_0.647_0.649

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 95%

Improvement:
Add "Sophie" and "Sophy" as hard negatives to reduce false triggers.

"小当家" or "Hi 小星" is preferable wake word in our scenario. Thanks a lot!

Both of these words sound good. If you have no preference, we will choose "hi 小星".

feizi commented

"小美同学"
model info: wakenet9l_tts1h8_小美同学_3_0.633_0.644

FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 95%

Hello! This is a great opportunity I was hoping would come up, I'm so glad this is now possible! I've seen that the wake-words "Mycroft" and "Hey, Mycroft" are very popular in the community, and it is also the name of my product so would very much improve user experience. Would it be possible to have either of these trained and released for the community? Thank you so much in advance for this!

@lewardo, I'm glad it could help you. Although "Mycroft" is simpler, it seems there are quite a few words that sound similar, so I'll prioritize training with "Hey Mycroft."

@Henry586 ,

Hi,小星: wakenet9l_tts1h8_Hi,小星_3_0.626_0.630

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 93%

I'd love to have "hey printer" available as a wake word/phrase.

I want to suggest a wake word ,"小龙小龙".
I'm glad to hear that you can create a wake word.

feizi commented

@lewardo , The performance of "Mycroft" also looks good. Pls try.
Mycroft: wakenet9l_tts1h8_Mycroft_3_0.625_0.629

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 96%

您好!我们正在开发一款名为喵喵同学的智能语音助手。是否帮我们实现一个“喵喵同学”的唤醒词,这将极大地帮助我们提升用户体验。先谢谢您!

Hey,Printer: wakenet9l_tts1h8_Heyprinter_3_0.623_0.629

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 78%

小龙小龙: wakenet9l_tts1h8_小龙小龙_3_0.624_0.628

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 95%

@sun-xiangyu Ooh, thank you! I'll try to test Hey Printer on the weekend!

Is there also a chance to get wakewords trained for the ESP32 variante (WakeNet5 if I understand it correctly)? Or is that obsolete? I would love to use an "Alexa" or "ok nabu" model on my M5Stack Atom Echos, which unfortunately only have an ESP32.

Hi @jhbruhn ,

Yes, WakeNet5 has been deprecated, and we are not training any WakeNet5 models. The ESP32 should be able to run WakeNet9, but we have not yet adapted it. This is because if you want to develop a wake word app with stable performance, it requires running in conjunction with the Audio Front End (AFE). It is diffcult for ESP32 to run in real time. Therefore, we recommend using the ESP32-S3.

小龙小龙: wakenet9l_tts1h8_小龙小龙_3_0.624_0.628

Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 95%
thank you ,i will try it.

喵喵同学: wakenet9l_tts1h8_喵喵同学_3_0.644_0.648

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 95%

喵喵同学: wakenet9l_tts1h8_喵喵同学_3_0.644_0.648

Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 95%

真是杰出的工作~这对我们来说意义重大!
太感谢您们了!
我们会尽快试用。

您好!我们打算做一款玩具,有没有机会帮我们实现一个“hi, Joy”的唤醒词?我们非常喜欢这个唤醒词,非常希望能够在esp上实现。@sun-xiangyu

Hi/Hey, Joy: wakenet9l_tts1h8_Hi,Joy_3_0.631_0.633

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 96%

Hi/Hey, Joy: wakenet9l_tts1h8_Hi,Joy_3_0.631_0.633

Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 96%

太感谢您了。
但是能否告知一下我应该怎么进行测试吗?
我使用了最新的的sr-1.7.0组件后,menuconfig里边并没有hijoy的选项。
我不知道应该怎么自定义唤醒词。
我看了下sr的文档,里边也并没有提到自定义的唤醒词,在代码方面怎么实现。

因为不是每添加一个唤醒词,都会release 一个新的版本。就是说你下载的版本还没有添加Hi,Joy唤醒词,你可以选择
手动的下载esp-sr master branch, 覆盖之前的esp-sr,就可以找到 Hi,Joy 唤醒词了

喵喵同学: wakenet9l_tts1h8_喵喵同学_3_0.644_0.648

Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 95%

@sun-xiangyu
一:
非常感谢。最近测了下喵喵同学的语音唤醒,发现喵喵跟同学两个词要在说的时候明显分开,唤醒成功率才会比较高。如果连在一起效果会差不少。
请问是否有机再帮忙实现下“Hi,喵喵"这个唤醒词?我觉得这个唤醒词的效果应该会好很多。

二:
另外,提供一个建议哈,就是使用TTS语音训练的唤醒词模型,我发现这类模型对发音的准确度要求很高,
大概率是因为TTS的语音的发音都很标准,所以导致训练出来的模型也需要很高的发音准确度才能唤醒。
但是一般人的发音的准确度没有那么高,所以导致训练出来的模型语音识别率并不高。

我在测试sophia以及hi, joy的时候,这个感觉很明显,只有发音非常准的时候才能成功识别,发音差一点就识别失败。
所以我提供的建议是,对于有一些发音比较难的词,如果用TTS训练的话,
可以考虑在训练素材里边增加一些的近音词来一同训练,这样应该可以大大提高识别率。
不知道我对TTS训练的思路的理解对不对哈,仅供参考。

@welkinchan
非常感谢你的反馈。

  1. 可以再训练 “Hi, 喵喵” 这个唤醒词
  2. 在唤醒的准确度和区分度上,我们需要去做一个权衡。 比如对于Sophia, 我们特意添加了近音词作为他的负样本来防止他被近音词唤醒。如果不是特别的需求,比如口音和方言等问题,我们一般不会添加近音词,因为这会在某种程度提高模型的误唤醒率。

We have a couple of 3D printers @CCHS-Melbourne called like this:

Wanda
Cosmo
Hey Wanda
Hey Cosmo

Would it be possible to train it ourselves? Is there documentation on how to train with i.e a H100 or Nvidia 4090?

The closest description about the data structures and training process I've seen is here:

https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3/wake_word_engine/README.html

But I cannot easily find a program/pipeline to run locally with all the input .wav files?

/cc @adricl @GoatNote

Would it be possible to train it ourselves? Is there documentation on how to train with i.e a H100 or Nvidia 4090?

We can help you train some wake words, but our training pipeline isn't open-sourced yet.

@feizi hello, can you train with the model "hey, Li Li" or "Hi, Li Li" or simply "Li Li", I'm a bit hesitant thinking about LiLi vs Li Li, whether whether it makes a big impact on accuracy. I tested this phrase with multinet7 and the results were also very positive, but I lacked a tool to evaluate accuracy.

Hi @dnambinh

For humans, "Li Li" and "LiLi" should be the same, but for TTS (Text-to-Speech), "Li Li" might insert a brief pause.
I'm not sure how you pronounce "LiLi", whether it's like the English word "Lily" or the Chinese name "莉莉".

Hi @dnambinh

For humans, "Li Li" and "LiLi" should be the same, but for TTS (Text-to-Speech), "Li Li" might insert a brief pause. I'm not sure how you pronounce "LiLi", whether it's like the English word "Lily" or the Chinese name "莉莉".

Yes, I have tried common tts tools and there is not much difference between "lily" and "lili". Ya, if it were a word that made sense in English it would be "lily". The initial idea was a certain wake-word that most people (speaking English - Chinese - Vietnamese - Arabic - ....) could easily say due to similarities in pronunciation. I'd be happy if you could train a similar model

Hi @dnambinh
For humans, "Li Li" and "LiLi" should be the same, but for TTS (Text-to-Speech), "Li Li" might insert a brief pause. I'm not sure how you pronounce "LiLi", whether it's like the English word "Lily" or the Chinese name "莉莉".

Yes, I have tried common tts tools and there is not much difference between "lily" and "lili". Ya, if it were a word that made sense in English it would be "lily". The initial idea was a certain wake-word that most people (speaking English - Chinese - Vietnamese - Arabic - ....) could easily say due to similarities in pronunciation. I'd be happy if you could train a similar model

OK,I will use both "Hi,Lily" and "Hi, 莉莉" to train a wake word model.

Hi,Lily/Hi,莉莉: wakenet9l_tts1h8_Hi,Lily or Hi,莉莉_3_0.633_0.639

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 93%

@dnambinh , looking forward to your feedback

Hi,Lily/Hi,莉莉: wakenet9l_tts1h8_Hi,Lily or Hi,莉莉_3_0.633_0.639

Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 93%

@dnambinh , looking forward to your feedback

Thanks for your support, i will feedback asap

@welkinchan 非常感谢你的反馈。

  1. 可以再训练 “Hi, 喵喵” 这个唤醒词
  2. 在唤醒的准确度和区分度上,我们需要去做一个权衡。 比如对于Sophia, 我们特意添加了近音词作为他的负样本来防止他被近音词唤醒。如果不是特别的需求,比如口音和方言等问题,我们一般不会添加近音词,因为这会在某种程度提高模型的误唤醒率。

您好,请问是否可以帮忙训练一下唤醒词“Hi, 喵喵”?@sun-xiangyu

An arabic wake word is missing . It could be any arabic name like "Sarah", "Rahma" or "Yasmeen" , or the greetings "Assalam-o-alaikum"

Hi @usama1123456789 ,

Now we only can generate English or Chinese TTS samples. we can try to train some wake words like "Hi,Sarah" or "Yasmeen" by English TTS samples. But I'm not sure if those model can work well for Arabic.

Hi,喵喵: wakenet9l_tts1h8_Hi,喵喵_3_0.636_0.641

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 94%

@welkinchan, please try.

Hey,Wanda: wakenet9l_tts1h8_Hey,Wanda_3_0.641_0.644

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 95%

@brainstorm , please try.

An arabic wake word is missing . It could be any arabic name like "Sarah", "Rahma" or "Yasmeen" , or the greetings "Assalam-o-alaikum"

From what I've seen use of this greeting is extremely common and it's likely you will get a lot of false-wake incidents when it is so frequently used in everyday conversation within audio range of your devices.

I'd suggest you recommend an acceptable alternative.

Hi @usama1123456789 ,

Now we only can generate English or Chinese TTS samples. we can try to train some wake words like "Hi,Sarah" or "Yasmeen" by English TTS samples. But I'm not sure if those model can work well for Arabic.

I guess English TTS trained files would be OK! For Hi Sarah, or Hi Yasmeen.

An arabic wake word is missing . It could be any arabic name like "Sarah", "Rahma" or "Yasmeen" , or the greetings "Assalam-o-alaikum"

From what I've seen use of this greeting is extremely common and it's likely you will get a lot of false-wake incidents when it is so frequently used in everyday conversation within audio range of your devices.

I'd suggest you recommend an acceptable alternative.

You are correct! That greeting would trigger many unwanted wake-ups.

By the way, can we get one called "Hi, Astrolabe" in English, of course?

We have a device called Astrolabe and would like to get one model trained for that word!

We are making cultural and creative products with a Harbin theme, and we hope to have some 'awakening words' like 'Hey, Xiaobin' or 'Xiaobin Xiaobin'. Could you please help us train that? Thank you.
@sun-xiangyu

We have a device called Astrolabe and would like to get one model trained for that word!

@usama1123456789 , Astrolabe sounds complex enough to be a stand-alone wake word, and I recommend leaving it unprefixed with "Hi"

Hi,Lily/Hi,莉莉: wakenet9l_tts1h8_Hi,Lily or Hi,莉莉_3_0.633_0.639

Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 93%

@dnambinh , looking forward to your feedback

Hi, @sun-xiangyu. I tried model 'Hi,Lily'. The results received were extremely positive.
Perform:
+) FAR(False Alarm Rate): Compared to the 'Alexa' model, Alexa's false activation rate is clearly higher, simply by putting the device into a random conversation.
+) RAR: When I knew that the model using TTS was only about 90-95% effective compared to the model using real human voice, and RAR was about 93%, I didn't expect much. However, the result was surprising, it was surprisingly accurate, I tried with the accent of almost everyone in the company, with other nationalities, even with the local accent, which is extremely difficult to hear.

From here it can be concluded that choosing the correct wakeword is extremely important and greatly affects the results.
Test device configuration:
+) ESP32S3
+) 1MIC vs 2MIC mems

However, I'm having a new problem. The idea is to wake up the device with a wakeword, then record until the VAD checks that there is no more voice (I checked for 1 second) and then send it to the server for further processing. I noticed that the VAD cannot check the sound of people far away (2m), while the wakeword still works well. If I reduce the VAD level to 2 or 1 it gets better however it is easily triggered by other noise.
Is there any feasible solution?

Thank you and team very much

We have a device called Astrolabe and would like to get one model trained for that word!

@usama1123456789 , Astrolabe sounds complex enough to be a stand-alone wake word, and I recommend leaving it unprefixed with "Hi"

ASTROLABE as a standalone would be OK and acceptable. Can we get that ?

ASTROLABE as a standalone would be OK and acceptable. Can we get that ?

OK

However, I'm having a new problem. The idea is to wake up the device with a wakeword, then record until the VAD checks that there is no more voice (I checked for 1 second) and then send it to the server for further processing. I noticed that the VAD cannot check the sound of people far away (2m), while the wakeword still works well. If I reduce the VAD level to 2 or 1 it gets better however it is easily triggered by other noise.
Is there any feasible solution?

@dnambinh, thank you for your detailed assessment. We have added both Lily (English) and 莉莉 (Chinese), which has increased the diversity of TTS samples. This may be one of the reasons for the better performance.

The VAD indeed has a lot of room for improvement. We can train a more accurate VAD using deep learning methods, but currently, we do not have a definite timeline.

ASTROLABE as a standalone would be OK and acceptable. Can we get that ?

OK

Any idea when can we get ASTROLABE wake word?

ASTROLABE as a standalone would be OK and acceptable. Can we get that ?

OK

Any idea when can we get ASTROLABE wake word?

two weeks later.

ASTROLABE as a standalone would be OK and acceptable. Can we get that ?

OK

Any idea when can we get ASTROLABE wake word?

two weeks later.

Great ! and thank you very much.
One question. could this be used with esp_sr v1.6?

ASTROLABE as a standalone would be OK and acceptable. Can we get that ?

OK

Any idea when can we get ASTROLABE wake word?

two weeks later.

Great ! and thank you very much. One question. could this be used with esp_sr v1.6?

Yes, if you don't want to update to the main branch, you need to manually copy your model to the wn9_customword and then load it in menuconfig by wn9_customword.

However, I'm having a new problem. The idea is to wake up the device with a wakeword, then record until the VAD checks that there is no more voice (I checked for 1 second) and then send it to the server for further processing. I noticed that the VAD cannot check the sound of people far away (2m), while the wakeword still works well. If I reduce the VAD level to 2 or 1 it gets better however it is easily triggered by other noise.
Is there any feasible solution?

@dnambinh, thank you for your detailed assessment. We have added both Lily (English) and 莉莉 (Chinese), which has increased the diversity of TTS samples. This may be one of the reasons for the better performance.

The VAD indeed has a lot of room for improvement. We can train a more accurate VAD using deep learning methods, but currently, we do not have a definite timeline.

Looking forward to new features from ESP🤩🤩

Astrolabe: wakenet9l_tts1h8_Astrolabe_3_0.625_0.632

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 94%

@usama1123456789 , please try.

小滨小滨,小冰小冰: wakenet9l_tts1h8_小滨小滨,小冰小冰_3_0.614_0.623

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 95%

@kaylyun , the pronunciation of 小滨小滨(xiao3 bin1) and 小冰小冰(xiao3 bing1) is similar, I used these two words during the training, so you can choose either one as the wake word.

for your detailed assessment. We have added both Lily (English) and 莉莉 (Chinese), which has increased the diversity of TTS samples. This may be one of the reasons for the better performance.

Thanks alot for this wakeword. Me and @usama1123456789 are really greatfull to you !

I think the community should think about keywords that are more likely to be effective in multiple languages. Another example I tried recently that worked quite well with the mn9 voice command is "'my my" - "mai mai" - "麦麦" or "mimi" - "mimi" - "咪咪". The languages ​​in order are English - Vietnamese - Chinese (sorry if my Chinese is wrong).
It is similar to the suggestion "lily" 🥇

What about using IoT device type as the wakeword, like "hi, air condition"、"hi, purifier"、“hi, humidifier” and so on.

What about using IoT device type as the wakeword, like "hi, air condition"、"hi, purifier"、“hi, humidifier” and so on.

@Oreobird In theory it is possible but I think it should not be. It will be fine if your room only has an air conditioner, but if there is an additional TV, water heater, cleaning robot,... everything will be very chaotic and cause difficulties for users😂

The process would be wake word -> speech command

What about using IoT device type as the wakeword, like "hi, air condition"、"hi, purifier"、“hi, humidifier” and so on.

@Oreobird In theory it is possible but I think it should not be. It will be fine if your room only has an air conditioner, but if there is an additional TV, water heater, cleaning robot,... everything will be very chaotic and cause difficulties for users😂

The process would be wake word -> speech command

I strongly agree with your reply. The issue of waking up multiple devices has always been a key problem in voice recognition. Anyway, what I think is that using device types as wake words might be useful for demonstration or demo scenarios.

Can you help me train, "嗨,小鱼/Little Fis"?
Thanks a million.

Hey, it would be cool for movie-fan to have "HAL" wake word call from the movie 2001: A Space Odyssey https://www.youtube.com/watch?v=ARJ8cAGm6JE

Hey, it would be cool for movie-fan to have "HAL" wake word call from the movie 2001: A Space Odyssey https://www.youtube.com/watch?v=ARJ8cAGm6JE

It sounds cool, but it is currently difficult for TTS to stably generate "HAL" pronunciations.

你好小智: wakenet9l_tts1h8_你好小智_3_0.631_0.635

Perfromace:
FAR(False Alarm Rate): 1 times / 24 hours
RAR(Right Alarm Rate): 98%

This model was trained by both human recording samples and TTS samples, which has higher response accuracy and lower false alarm rate.

Can you help with Hi Rico, Hello Rico, Rico同学?

Many Thanks!!!

suggest wakeup word: 游戏管家

Can you help with Hi Rico, Hello Rico, Rico同学?

Many Thanks!!!

I recommend using Hi Rico

Can you help with Hi Rico, Hello Rico, Rico同学?
Many Thanks!!!

I recommend using Hi Rico

Geat! Thanks!

By the way, can I make the model training by my self? Dose the model provide the source and documents?

Can you help with Hi Rico, Hello Rico, Rico同学?
Many Thanks!!!

I recommend using Hi Rico

Geat! Thanks!

By the way, can I make the model training by my self? Dose the model provide the source and documents?

The training script is not yet open source. If you want to deploy your own model, you can use esp-dl project.

SR is a fantastic project! it highly boosts UI/UX design capabilities of our product.
Please help us to train following wakewords:
Hey, Telly
泰力泰力

Many many thanks!

Currently, we support over 20 wake words. You can choose any one wake word to test. Starting from August 1, 2024, to get a new wake word, you'll need to meet one of these requirements:

  1. If you've got an ongoing project, kindly attach the project link along with a brief overview when submitting your request.
  2. Your wake word has been liked or upvoted by more than five people.

We are preparing to upgrade to a new TTS model and generate some wake word models with better performance.

Currently, we support over 20 wake words. You can choose any one wake word to test. Starting from August 1, 2024, to get a new wake word, you'll need to meet one of these requirements:

  1. If you've got an ongoing project, kindly attach the project link along with a brief overview when submitting your request.
  2. Your wake word has been liked or upvoted by more than five people.

We are preparing to upgrade to a new TTS model and generate some wake word models with better performance.

Very excited to hear about the TTS model improvements along with the wake up model for better performance especially with RAR and speed.
And it would be great if you could retrain "Hi, Lily", I can report more details on the performance related changes✌️

@blessalanou

Hi,Telly/Hi,泰力: wakenet9l_tts1h8_Hi,Telly or Hi,泰力_3_0.613_0.619

Perfromace:
FAR(False Alarm Rate): 1 times / 24 hours
RAR(Right Alarm Rate): 94%

The training data of this model is similar with Hi Lily wake word, include both English "Hi, Telly" and Chinese "Hi, 泰力".

Is it possible for esp-sr or esp-skainet or esp-adf to provide an interface or process for custom wake-up word functionality? For example, by users recording the wake-up word and training a model via a training script deployed in the cloud, then updating it to the device.

Is it possible for esp-sr or esp-skainet or esp-adf to provide an interface or process for custom wake-up word functionality? For example, by users recording the wake-up word and training a model via a training script deployed in the cloud, then updating it to the device.

As far as I know, not yet.
Before I knew about esp-adf and esp-sr, I used custom wakeword using a model created by python + tensorflow, then quantized the model and used tflite (tensorflow for microcontrollers). The model ran successfully, but maybe due to optimization issues, it consumed a lot of ram and memory (you can run the model on any microcontroller this way, of course if you have enough resources).
I probably would have continued to optimize using tflite until I discovered that ESP provides ESP-DL, which also allows model deployment with hardware support.
You can find out from what I say.

Is it possible for esp-sr or esp-skainet or esp-adf to provide an interface or process for custom wake-up word functionality? For example, by users recording the wake-up word and training a model via a training script deployed in the cloud, then updating it to the device.

As far as I know, not yet. Before I knew about esp-adf and esp-sr, I used custom wakeword using a model created by python + tensorflow, then quantized the model and used tflite (tensorflow for microcontrollers). The model ran successfully, but maybe due to optimization issues, it consumed a lot of ram and memory (you can run the model on any microcontroller this way, of course if you have enough resources). I probably would have continued to optimize using tflite until I discovered that ESP provides ESP-DL, which also allows model deployment with hardware support. You can find out from what I say.

Deployment in the cloud is a bit difficult for us and is not in our plans. I think esp-dl might be a solution, if you want to deploy a model of your own, I recommend you to use it.
Good news, we are refactoring esp-dl so that esp-dl can directly load quantized models, just like you use onnx and pytorch.

Looks like there is already a wake-up word in English: Hey,Wand. Can we have a Chinese version? e.g. 神奇魔仗

We are using Espressif's ESP32S3 chip to create a small wizard dialogue toy that can provide great emotional value and companionship.
We look forward to your help in training the following wake words.
“Hi,小巫”

We are using Espressif's ESP32S3 chip to create a small wizard dialogue toy that can provide great emotional value and companionship. We look forward to your help in training the following wake words. “Hi,小巫”

Sounds great, I'm happy to help train a "Hi,小巫" wake word.

We are working on a patient-side voice assistant for the healthcare space. We desperately need help training the English branded wake word, "Hey, Henry". We are currently testing with the ESP-BOX-S3. Many thanks in advance.

sudo or hey sudo would be cool