Azure-Samples/SpeechToText-WebSockets-Javascript

Error: Failed to construct 'AudioContext': The number of hardware contexts provided (6) is greater than or equal to the maximum bound (6).

rcollette opened this issue · 7 comments

On repeated creation of the recognizer, I get the following error, despite the reference variable being a singleton.

Error: Failed to construct 'AudioContext': The number of hardware contexts provided (6) is greater than or equal to the maximum bound (6).
    at new MicAudioSource (http://localhost:3001/app.bundle.js:12918:24)
    at CreateRecognizerWithPcmRecorder (http://localhost:3001/app.bundle.js:14274:84)
    at Object.CreateRecognizer (http://localhost:3001/app.bundle.js:14270:12)
    at Function.SpeechToTextService.initializeRecognizer (http://localhost:3001/app.bundle.js:1306:37)
    at SpeechToTextService.start (http://localhost:3001/app.bundle.js:1284:48)
    at SpeechGeosearchService.start (http://localhost:3001/app.bundle.js:2978:35)
    at VoiceGeosearchModalController.start (http://localhost:3001/app.bundle.js:14376:38)
    at VoiceGeosearchModalController (http://localhost:3001/app.bundle.js:14370:14)
    at Object.invoke (http://localhost:3001/bower_components/angular/angular.js:4523:17)
    at extend.instance (http://localhost:3001/bower_components/angular/angular.js:9182:34)

It seems like the connections are closing asynchronously, perhaps due to garbage collection. 10 or so seconds after getting this error, I will start seeing several of the following messages in the console.

ConsoleLoggingListener.ts:25 2017-11-28T01:26:45.275Z | ConnectionClosedEvent | metadata: {} | connectionId: 5380432B0E384538B97A6E4A0A313E44 | reason:  | statusCode: 1006

I think we need a way to explicitly close the connection, or otherwise close/destroy the recognizer. Or perhaps there is an undocumented method for doing this, but I haven't been able to find it poking around the source.

AngularJS service code

import {
  CognitiveSubscriptionKeyAuthentication,
  ConnectingToServiceEvent,
  Context,
  CreateRecognizer,
  Device,
  ListeningStartedEvent,
  OS,
  RecognitionEndedEvent,
  RecognitionMode,
  RecognitionStartedEvent,
  RecognitionTriggeredEvent,
  Recognizer,
  RecognizerConfig,
  SpeechConfig,
  SpeechDetailedPhraseEvent,
  SpeechEndDetectedEvent,
  SpeechFragmentEvent,
  SpeechHypothesisEvent,
  SpeechRecognitionEvent,
  SpeechResultFormat,
  SpeechSimplePhraseEvent,
  SpeechStartDetectedEvent
} from '../../vendor/microsoft-speech-browser-sdk/Speech.Browser.Sdk';

export class SpeechToTextServiceConfig {
  recognitionMode: RecognitionMode;
  language: string;
  format: SpeechResultFormat;
  subscriptionKey: string;
  public onRecognitionTriggered: (event: RecognitionTriggeredEvent) => void;
  public onListeningStarted: (event: ListeningStartedEvent) => void;
  public onConnectingToService: (event: ConnectingToServiceEvent) => void;
  public onRecognitionStarted: (event: RecognitionStartedEvent) => void;
  public onSpeechStartDetected: (event: SpeechStartDetectedEvent) => void;
  public onSpeechFragment: (event: SpeechFragmentEvent) => void;
  public onSpeechHypothesis: (event: SpeechHypothesisEvent) => void;
  public onSpeechEndDetected: (event: SpeechEndDetectedEvent) => void;
  public onSpeechSimplePhrase: (event: SpeechSimplePhraseEvent) => void;
  public onSpeechDetailedPhrase: (event: SpeechDetailedPhraseEvent) => void;
  public onRecognitionEnded: (event: RecognitionEndedEvent) => void;
  public onError: (error: any) => void;
}

export class SpeechToTextService {
  static $inject = ['$log', '$rootScope', '$q'];

  private _recognizer: Recognizer;
  private _config: SpeechToTextServiceConfig;

  constructor(private _$log: ng.ILogService,
              private _$rootScope: ng.IRootScopeService,
              private _$q: ng.IQService) {
  }

  public start(config: SpeechToTextServiceConfig) {
    if (config == null) {
      throw new Error('config argument is null');
    }
    //TODO - May need more configuration validation.
    this._config = config;
    if (this._recognizer) {
      this.recognizerStop();
    }
    this._recognizer = SpeechToTextService.initializeRecognizer(
      this._config.recognitionMode,
      this._config.language,
      this._config.format,
      '92e33e89487d42b18df0fe72fd7c1b1a'
    );
    this._recognizerStart(this._recognizer);
  }

  public recognizerStop(): ng.IPromise<boolean> {
    let deferred = this._$q.defer<boolean>();
    // recognizer.AudioSource.Detach(audioNodeId) can be also used here. (audioNodeId is part of ListeningStartedEvent)
    this._recognizer.AudioSource.TurnOff()
      .On(() => {
        deferred.resolve(true);
      }, (error: any) => {
        deferred.reject(error);
      });
    return deferred.promise;
  }

  static initializeRecognizer(recognitionMode: RecognitionMode, language: string, format: SpeechResultFormat, subscriptionKey: string) {
    //TODO - Endpoint configuration.
    let recognizerConfig = new RecognizerConfig(
      new SpeechConfig(
        new Context(
          new OS(navigator.userAgent, 'Browser', null),
          new Device('SpeechSample', 'SpeechSample', '1.0.00000'))), //TODO - Update with application platform information.
      recognitionMode, // SDK.RecognitionMode.Interactive  (Options - Interactive/Conversation/Dictation)
      language, // Supported laguages are specific to each recognition mode. Refer to docs.
      format); // SDK.SpeechResultFormat.Simple (Options - Simple/Detailed)

    // Alternatively use SDK.CognitiveTokenAuthentication(fetchCallback, fetchOnExpiryCallback) for token auth
    let authentication = new CognitiveSubscriptionKeyAuthentication(subscriptionKey);

    return CreateRecognizer(recognizerConfig, authentication);
  }

  private _recognitionTriggered(event: RecognitionTriggeredEvent) {
    this._$log.debug(`RecognitionTriggeredEvent:`, event);
    if (this._config.onRecognitionTriggered) {
      this._config.onRecognitionTriggered(event);
    }
  }

  private _listeningStarted(event: ListeningStartedEvent) {
    this._$log.debug(`ListeningStartedEvent:`, event);
    if (this._config.onListeningStarted) {
      this._config.onListeningStarted(event);
    }
  }

  private _connectingToService(event: ConnectingToServiceEvent) {
    this._$log.debug(`ConnectingToServiceEvent:`, event);
    if (this._config.onConnectingToService) {
      this._config.onConnectingToService(event);
    }
  }

  private _recognitionStarted(event: RecognitionStartedEvent) {
    this._$log.debug(`RecognitionStartedEvent:`, event);
    if (this._config.onRecognitionStarted) {
      this._config.onRecognitionStarted(event);
    }
  }

  private _speechStartDetected(event: SpeechStartDetectedEvent) {
    this._$log.debug(`SpeechStartDetectedEvent:`, event);
    if (this._config.onSpeechStartDetected) {
      this._config.onSpeechStartDetected(event);
    }
  }

  private _speechFragmentEvent(event: SpeechFragmentEvent) {
    this._$log.debug(`SpeechFragmentEvent: ${event.Result.Text}`, event);
    if (this._config.onSpeechFragment) {
      this._config.onSpeechFragment(event);
    }
  }

  private _speechHypothesis(event: SpeechHypothesisEvent) {
    this._$log.debug(`SpeechHypothesisEvent: ${event.Result.Text}`, event);
    if (this._config.onSpeechHypothesis) {
      this._config.onSpeechHypothesis(event);
    }
    // event.Result.Text;
  }

  private _speechEndDetected(event: SpeechEndDetectedEvent) {
    this._$log.debug(`SpeechEndDetected`, event);
    if (this._config.onSpeechEndDetected) {
      this._config.onSpeechEndDetected(event);
    }
  }

  private _speechSimplePhrase(event: SpeechSimplePhraseEvent) {
    this._$log.debug(`SpeechSimplePhraseEvent: ${event.Result.DisplayText}`, event);
    if (this._config.onSpeechSimplePhrase) {
      this._config.onSpeechSimplePhrase(event);
    }
    //event.Result;
  }

  private _speechDetailedPhrase(event: SpeechDetailedPhraseEvent) {
    this._$log.debug(`SpeechDetailedPhraseEvent`, event);
    if (this._config.onSpeechDetailedPhrase) {
      this._config.onSpeechDetailedPhrase(event);
    }
  }

  private _recognitionEnded(event: RecognitionEndedEvent) {
    this._$log.debug(`RecognitionEndedEvent:`, event);
    if (this._config.onRecognitionEnded) {
      this._config.onRecognitionEnded(event);
    }
    this._recognizer = null;
  }

  private _error(error: any) {
    this._$log.error(error);
    if (this._config.onError) {
      this._config.onError(error);
    }
  }

  private _recognizerStart(recognizer: Recognizer) {
    recognizer.Recognize((event: SpeechRecognitionEvent) => {
        //We can use swtich when this issue closes
        //https://github.com/Microsoft/TypeScript/issues/2214
        console.log(`event ${event.Name}`, event);
        // If we do not apply this asynchronously, it seems to hang for some period of time before resolving.
        // Perhaps this happens on websocket disconnection?
        this._$rootScope.$evalAsync(() => {
          if (event instanceof RecognitionTriggeredEvent) {
            this._recognitionTriggered(event);
          } else if (event instanceof ListeningStartedEvent) {
            this._listeningStarted(event);
          } else if (event instanceof ConnectingToServiceEvent) {
            this._connectingToService(event);
          } else if (event instanceof RecognitionStartedEvent) {
            this._recognitionStarted(event);
          } else if (event instanceof SpeechStartDetectedEvent) {
            this._speechStartDetected(event);
          } else if (event instanceof SpeechHypothesisEvent) {
            this._speechHypothesis(event);
          } else if (event instanceof SpeechFragmentEvent) {
            this._speechFragmentEvent(event);
          } else if (event instanceof SpeechEndDetectedEvent) {
            this._speechEndDetected(event);
          } else if (event instanceof SpeechSimplePhraseEvent) {
            this._speechSimplePhrase(event);
          } else if (event instanceof SpeechDetailedPhraseEvent) {
            this._speechDetailedPhrase(event);
          } else if (event instanceof RecognitionEndedEvent) {
            this._recognitionEnded(event);
          } else {
            console.warn('unknown event', event);
          }
        });
      }
    )
      .On((x: any) => {
          this._$log.error('promise resolved', x);
        },
        (error: any) => {
          this._error(error);
        }
      );
  }
}

Indeed, however, I found an alternative solution that it solves by a trueRecognizing flag.

// initial phase
let trueRecognizing = false;
....
// recognition phase
case 'SpeechHypothesisEvent':
      trueRecognizing = true; 
      ...
      break;
case 'SpeechFragmentEvent':
      trueRecognizing = true;
      ...
      break;
case 'RecognitionEndedEvent':
      self.recognizing = false;
      if (!trueRecognizing) {
          stopRecognition();
          recognizer = initRecognizer();
          startRecognition();
          return;
     } else {
          trueRecognizing = false;
     }
    ....

// stop phase
stopRecognition() {
     if (recognizer) {
         recognizer.AudioSource.TurnOff();
     }  
     trueRecognizing = false;
     ....
}

The idea of this solution is based on an observation that the recognition suddenly stops without any explicit events trigger but directly jump to RecognitionEndedEvent. This solution is able to recreate the connection without trigger this audiocontext error.

This does not correct the condition, it merely works around it after-the-fact. In my application I have a global error handler that still gets called because of this. I think it is only working for you by luck of timing because I am calling turn off before any start and on recognition end. While the audio source is turned off, the connection to it (context) and the web socket connection are not released. There is a dispose method deeper in the code but it is not exposed at the SDK API layer.

@rcollette Interesting, agree your statement. Somehow my solution always works for me. I need to check my code, thanks for your hint.

It seems that you end up with 6 recognizers running in parallel. Is that by design? Start creates a new recognizer each time, it's called. Does it make sense to reuse an existing recognizer instance, if none of the config parameters changed?

Never mind, working on a fix for this.

@raaaar - This does not seem to have fixed the problem for me. I updated the version from 0.0.6 to 0.0.12 and I still have this issue.

@rcollette - could you share a quick repro?