microsoft/cognitive-services-speech-sdk-js

[Bug]: Receive InvalidOperation: Payload must be ArrayBuffer

LiamMorrow opened this issue · 7 comments

What happened?

When using the PushStream with a SpeechRecognizer, very infrequently we receive an Error which cancels the stream saying that the SDK expects an ArrayBuffer.

We pipe through a couple of conversion steps on our audio stream, and use the node shared buffer pool, which may or may not return Buffers which are just a slice of a larger shared ArrayBuffer. The interface is the same, so it should be more or less transparent to consumers of the buffer stream.

Just wondering if there's a technical limitation to why the PushAudioInputStream requires an ArrayBuffer in its write method vs just a Buffer

Version

1.30.1 (Default)

What browser/platform are you seeing the problem on?

Node

Relevant log output

InvalidOperation: Payload must be ArrayBuffer
    at new InvalidOperationError (/app/node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common/Error.js:62:28)
    at SpeechConnectionMessage.ConnectionMessage (/app/node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common/ConnectionMessage.js:21:19)
    at new SpeechConnectionMessage (/app/node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common.speech/SpeechConnectionMessage.Internal.js:52:28)
    at SpeechServiceRecognizer.<anonymous> (/app/node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common.speech/ServiceRecognizerBase.js:867:61)
    at step (/app/node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common.speech/ServiceRecognizerBase.js:35:23)
    at Object.next (/app/node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common.speech/ServiceRecognizerBase.js:16:53)
    at fulfilled (/app/node_modules/microsoft-cognitiveservices-speech-sdk/distrib/lib/src/common.speech/ServiceRecognizerBase.js:7:58)

Hi @LiamMorrow, thanks for using JS Speech SDK and writing this issue up. We use ArrayBuffer because it's a base JS type, and thus accessible without conversion for our customers using our library as a browser include. AFAIK Buffer is Node-only, correct?

Ah of course that makes sense for the typing, thanks for getting back to me so soon!

I guess we're a little surprised by the runtime behavior, where we wouldn't have expected that it should throw, as Buffer has the same interface as ArrayBuffer, and can more or less be dropped in with no issues.

Just for reference, we've attemped to tackle this by performing a runtime typecheck on the buffers returned from our streams and if they're not an ArrayBuffer, we create a new one with the data from the original Buffer, however this seemed to blow out our memory usage significantly.

Perhaps instead of checking that the data is an instanceof ArrayBuffer it could be more of a duck type check?

Relevant code

        if (messageType === MessageType.Binary && body && !(body instanceof ArrayBuffer)) {
            throw new InvalidOperationError("Payload must be ArrayBuffer");

Duck typing could work, Buffer instances have the byteOffset property, while ArrayBuffer instances do not...but you're not able to use the .buffer property on the buffer to get the underlying ArrayBuffer? (ala this comment)?

but you're not able to use the .buffer property on the buffer to get the underlying ArrayBuffer?

So while that returns the underlying ArrayBuffer, the Buffer we have might be just a slice over that underlying ArrayBuffer (hence byteOffset and length).

For instance if a Buffer is allocated, the node runtime may not provide an instance of an ArrayBuffer, instead it may return a Bufferthat is a view over a larger ArrayBuffer from the pool. So while buffer.size might be for example 100, buffer.buffer.size could be something like 1024, and the underlying buffer would contain garbage unrelated data.

At one stage, before we wrote the buffer into the PushStream we did do something like this (pseudocode):

ensureArrayBuffer(buf: Buffer) : ArrayBuffer{
  if(buf instanceof ArrayBuffer) return buf
  
  const underlyingBuffer = buf.buffer
  
  if(underlyingBuffer.size === buf.size) return underlyingBuffer
  
  const definitelyAnArrayBuffer = new ArrayBuffer(buf.size)
  buf.copy(definitelyAnArrayBuffer)
  
  return definitelyAnArrayBuffer
}

which did mean we never got this error, but as mentioned earlier it ended up blowing out our memory usage significantly (which we're not too sure why to be honest)

which did mean we never got this error, but as mentioned earlier it ended up blowing out our memory usage significantly (which we're not too sure why to be honest)

Ah right, Node tries to make raw data handling easier, which ends up making use cases like yours difficult. Sounds familiar.

The memory spike does sound puzzling, I'd do some A-B profiling to figure out if that new ArrayBuffer() is the culprit, and if so, naively try force deleting it after slicing it into the pushStream.

Ah right, Node tries to make raw data handling easier, which ends up making use cases like yours difficult. Sounds familiar.

Yeah it's a fun time.

The memory spike does sound puzzling, I'd do some A-B profiling to figure out if that new ArrayBuffer() is the culprit

Yeah at the time we were actually combating some other memory usage issues when streaming multiple hours of audio, but were able to reliably reproduce an OOM when flagging on the above code.

and if so, naively try force deleting it after slicing it into the pushStream.

Unfortunately if I'm understanding you correctly, I don't think that's feasible even if we could force a deallocation of the buffer after calling write, the PushStream is implemented as a queue (through Stream) with as far as we can tell, no backpressure, so while write may have completed, it may not have actually been sent to azure yet.

The code I had in mind:

function getCopyOfArrayBuffer() {
   // what you previously constructed
   
 }
 
 eventHandler.on('some_event', () => {
  const arrayBuffer = getCopyOfArrayBuffer();
  pushStream.write(arrayBuffer.slice());
 } // arrayBuffer should be gc-ed here, but obv. the copy lives on while the sdk processes the audio data