Request: Benchmark with Houndify
DanBmh opened this issue · 7 comments
Could you please run a benchmark with Houndify? They follow a similar approach like you and claim that their "Speech-to-Meaning® engine delivers unprecedented speed and accuracy".
I looked into running it on my own, but would take me about half a year using their Free Tier and would be quite expensive otherwise (a few hundred dollars) as a private developer.
Thanks for the comment. This could be interesting. Does Houndify let you download your assistant like DialogFlow? If yes, you can probably try to build the logic and code around it and then submit a PR. Once that is in place I can try to get the budget to run the full benchmark. Let me know if that works on your end.
I'm not sure this will work as required, I wasn't able to find a prebuilt domain which can handle the coffee order requests. While Houndify allows me to create custom commands, they only work for intent recognition but not for entity extraction, as far as I can see.
"Expressions do not have support for wildcards, entities, or ignore patterns. These features are supported by Custom Domains." (https://www.houndify.com/docs#custom-commands). According to their pricing page, I need a enterprise account for this.
But what I can do, is to use custom commands and assign a custom return statement, like this:
{
"Expression": "make me a large coffee",
"Result": {
"intent": "coffee",
"slots": {
"size": "large"
}
}
}
Problem might be, that this will only work for exact matches, a request like I want a large coffee
won't work. So we don't get correct answers if a word was not understood correctly while transcribing the audio or the sentence was not in the training intents.
I did run some tests this afternoon and was able to transcribe the wav files (which seems to work quite well), but the nlu extraction doesn't work. As mentioned above the workaround is only working if the transcription exactly matches the training sample, and that's not the case here.
The code is currently here: https://gitlab.com/Jaco-Assistant/Benchmark-Jaco/-/tree/bench_houndify/Barrista-Houndify
(will add PR if we can get it to work)
Thanks a lot! Keep me in the loop
Using the barrista dataset didn't work for me, but I was able to run a benchmark with the Snips-SmartLight dataset. Houndify has a pretrained domain for Home Automation commands, which I could use for this.
The performance of Houndify was quite poor compared to the others, because often utterances like "it's too dark in here", which should trigger the "SwitchLightOn" intent, weren't understood. I wasn't able to teach those into the pretrained intents.
For a better comparison I also did measure WER, Houndify reached 10.8% here while Jaco only had 19.2%.
Code is now here: https://gitlab.com/Jaco-Assistant/Benchmark-Jaco/-/tree/master/SmartLights-Houndify
Do you think you can run benchmarks with Rhino, Google (above is from Snips paper 2019) and Watson+Luis on this dataset?
I will try to add a benchmark with Alexa in the next weeks.
this is awesome! I'll be sure to go through the code. Do you know by any chance the WER of Snips or Google in this experiment?
Do you know by any chance the WER of Snips or Google in this experiment?
No, they didn't publish those in their paper