watson-developer-cloud/speech-to-text-nodejs

WebSocket connection error

leibaogit opened this issue · 9 comments

Describe the bug
I’m running the speech-to-text-demo in my local, encountering an websocket connection issue, When I click the Record Audio button, it returns WebSocket connection error.

I set the token in the .var file:

% cat .env
# Environment variables
SPEECH_TO_TEXT_IAM_APIKEY=lYzPliclaElz35zV1dHJw9......
SPEECH_TO_TEXT_URL=https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/ee92ee32-297b-41dd-9c5d-0f8eecd8a120

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'http://localhost:3000/'
  2. Click on 'Record Audio'
  3. Open the debug console
  4. See error

Expected behavior
The websocket access should success

Screenshots
image (34)
image (35)

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Root cause

Websocket request does not include the access_token parameter:

wss://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/ee92ee32-297b-41dd-9c5d-0f8eecd8a120/v1/recognize?model=en-US_BroadbandModel

Analysis

The following code in the node_modules/watson-speech/speech-to-text/recognize-stream.js:161 removes the access_token from the options:

var queryParams = processUserParameters(options, queryParamsAllowed);

This is the processUserParameters code:

node_modules/watson-speech/util/process-user-parameters.js
module.exports = function processUserParameters(options, allowedParams) {
  var processedOptions = {};
  // look for the camelcase version of each parameter - that is what we expose to the user
  allowedParams.forEach(param => {
    var keyName = camelcase(param); <<<<<<< it change the param to camelcase: access_token --> accessToken
    if (options[keyName] !== undefined) {
      processedOptions[param] = options[keyName];
    }
  });
  return processedOptions;
};

In the node_modules/watson-speech/speech-to-text/recognize-stream.js: initialize function, I added:

...
console.log("RecognizeStream.option 1: " + JSON.stringify(options));
  // process query params
  var queryParamsAllowed = [
    'access_token',
    'watson-token',
    'model',
    'language_customization_id',
    'acoustic_customization_id',
    'base_model_version',
    'x-watson-learning-opt-out',
    'x-watson-metadata'
  ];
  var queryParams = processUserParameters(options, queryParamsAllowed);
  console.log("RecognizeStream.option 2: " + JSON.stringify(queryParams));
....

The log print the access_token was removed:
image

So we may need someplace to translate the access_token to accessToken.

Looks like the /Users/ibmuser/bali/IBM/CDLTechBuddy/speech-to-text-nodejs/node_modules/watson-speech/speech-to-text/recognize-stream.js : initialize function is a good place, I added the following code:

  if (options.access_token && !options['accessToken']) {
    options['accessToken'] = options.access_token;
  }

and it resolved the issue, the wss request returned 101 success:

image

Seems can not just change the access_token because I found another issue which has the same root cause:
iShot2021-07-27 15 51 22

The keywords_threshold was also skipped.

So the only workaround way is remove the camelcase() in the:

node_modules/watson-speech/util/process-user-parameters.js

module.exports = function processUserParameters(options, allowedParams) {
  var processedOptions = {};

  // look for the camelcase version of each parameter - that is what we expose to the user
  allowedParams.forEach(param => {
    // var keyName = camelcase(param);   -----
    keyName = param;   // ++++++
    if (options[keyName] !== undefined) {
      processedOptions[param] = options[keyName];
    }
  });

Well, even with the above change, I still encounter other issue, which was caused by the above change:
Error: unable to transcode data stream application/octet-stream -> audio/l16
image (36)

The root cause was that the content type header was skipped by the above code change because the option uses contentType, but in the openingMessageParamsAllowed, it uses the keywords_threshold.

So now I made this change to change all the keys in the options to the camelcase at the very first of the initialize function:

RecognizeStream.prototype.initialize = function() {
  var options = {};
  var camelcase = require('camelcase');

  for (const [key, value] of Object.entries(this.options)) {
    var newKey = camelcase(key);
      options[newKey] = value;
  }
....

Now everything works great !!!!

Seems don't need to change the sdk code, just need to update the demo’s code:

% git diff views/demo.jsx
diff --git a/views/demo.jsx b/views/demo.jsx
index 05569e8..a7bbe50 100644
--- a/views/demo.jsx
+++ b/views/demo.jsx
@@ -105,22 +105,22 @@ export class Demo extends Component {
     const keywords = this.getKeywordsArrUnique();
     return Object.assign({
       // formats phone numbers, currency, etc. (server-side)
-      access_token: this.state.accessToken,
+      accessToken: this.state.accessToken,
       token: this.state.token,
-      smart_formatting: true,
+      smartFormatting: true,
       format: true, // adds capitals, periods, and a few other things (client-side)
       model: this.state.model,
       objectMode: true,
-      interim_results: true,
+      interimResults: true,
       // note: in normal usage, you'd probably set this a bit higher
-      word_alternatives_threshold: 0.01,
+      wordAlternativesThreshold: 0.01,
       keywords,
-      keywords_threshold: keywords.length
+      keywordsThreshold: keywords.length
         ? 0.01
         : undefined, // note: in normal usage, you'd probably set this a bit higher
       timestamps: true, // set timestamps for each word - automatically turned on by speaker_labels
       // includes the speaker_labels in separate objects unless resultsBySpeaker is enabled
-      speaker_labels: this.state.speakerLabels,
+      speakerLabels: this.state.speakerLabels,
       // combines speaker_labels and results together into single objects,
       // making for easier transcript outputting
       resultsBySpeaker: this.state.speakerLabels,

Created a PR with the change: #266

Closing as resolved from PR: #266