microsoft/cognitive-services-speech-sdk-js

SDK returns no match, but the online recognizer works.

jBernavaPrah opened this issue · 3 comments

What happened?

Hi guys!

I'm not sure if this is a bug.

I was playing with the online recognizor tool and the attached record file (a wav, 48000 sample rate, 32 Bits per sample, 1 channel and a total of 2 seconds of my voice saying "Hi, one, two" in Italian.) is recognised correctly.

But when I use this package, the response is always sdk.ResultReason.NoMatch.

So I dug in and found that on the online tool, the file is encoded(decoded?) in a standard uncompressed PCM format (16000kh, 16 bits per sample, 1 channel) before being sent to Azure service. See the Ref1. Also the wave headers sent on the first binary message confirm this (Ref 2).

How can I do the same using the package or what I'm missing?

Thanks,
JBP


ref1
Ref1

ref2
image

# Hex values
5249 4646 0000 0000 5741 5645 666d 7420 <-- the 0x20=16, here mean simple PCM.
1000 0000 0100 0100 803e 0000 007d 0000  
0200 1000 6461 7461 0000 0000

recorded.wav.zip

Version

1.34.0 (Latest)

What browser/platform are you seeing the problem on?

Node

Relevant log output

No response

Hey, thanks for reaching out!

The Speech SDK will need to be told about the format of the audio being sent in if it's not 16bit-16Khz mono.

There's a sample class here that will read the header information from a wave file and write the file to a push stream.

You can see it used here.

Please let me know if you need anything else.

Hi @rhurey,
thanks for your reply!

Unfortunately, the sample code you posted is not working with WAV files with headers created in "wave_format_extended".

I will try to change the code, to stripe those headers and push directly the audio bytes to the SDK with the hard-coded format.

In the meantime, what formats does the Azure Speech Service accept?
In the examples/tests, the audio files in the SDK source code, are in 16khz and 16bit and the same online speech tool transforms the file to 16khz and 16bit. Is this the only accepted format from the Azure Speech Recognition service?

Thanks!

Updates:
The service successfully recognized my voice when I sent the raw audio stripped by the audio extended headers and with the specific format of the audio file (48khz and 32bitpersample and PCM).

Thanks for your time!