WebSocket connection error

Question

WebSocket connection error

leibaogit opened this issue 3 years ago · 9 comments

Describe the bug
I’m running the speech-to-text-demo in my local, encountering an websocket connection issue, When I click the Record Audio button, it returns WebSocket connection error.

I set the token in the .var file:

% cat .env
# Environment variables
SPEECH_TO_TEXT_IAM_APIKEY=lYzPliclaElz35zV1dHJw9......
SPEECH_TO_TEXT_URL=https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/ee92ee32-297b-41dd-9c5d-0f8eecd8a120

To Reproduce
Steps to reproduce the behavior:

Go to 'http://localhost:3000/'
Click on 'Record Audio'
Open the debug console
See error

Expected behavior
The websocket access should success

Screenshots

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Answer 1 · 2021-07-27T06:52:44.000Z

Root cause

Websocket request does not include the access_token parameter:

wss://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/ee92ee32-297b-41dd-9c5d-0f8eecd8a120/v1/recognize?model=en-US_BroadbandModel

Answer 2 · 2021-07-27T06:54:57.000Z

Analysis

The following code in the node_modules/watson-speech/speech-to-text/recognize-stream.js:161 removes the access_token from the options:

var queryParams = processUserParameters(options, queryParamsAllowed);

This is the processUserParameters code:

node_modules/watson-speech/util/process-user-parameters.js
module.exports = function processUserParameters(options, allowedParams) {
  var processedOptions = {};
  // look for the camelcase version of each parameter - that is what we expose to the user
  allowedParams.forEach(param => {
    var keyName = camelcase(param); <<<<<<< it change the param to camelcase: access_token --> accessToken
    if (options[keyName] !== undefined) {
      processedOptions[param] = options[keyName];
    }
  });
  return processedOptions;
};

Answer 3 · 2021-07-27T07:01:41.000Z

In the node_modules/watson-speech/speech-to-text/recognize-stream.js: initialize function, I added:

...
console.log("RecognizeStream.option 1: " + JSON.stringify(options));
  // process query params
  var queryParamsAllowed = [
    'access_token',
    'watson-token',
    'model',
    'language_customization_id',
    'acoustic_customization_id',
    'base_model_version',
    'x-watson-learning-opt-out',
    'x-watson-metadata'
  ];
  var queryParams = processUserParameters(options, queryParamsAllowed);
  console.log("RecognizeStream.option 2: " + JSON.stringify(queryParams));
....

The log print the access_token was removed:

Answer 4 · 2021-07-27T07:08:51.000Z

So we may need someplace to translate the access_token to accessToken.

Looks like the /Users/ibmuser/bali/IBM/CDLTechBuddy/speech-to-text-nodejs/node_modules/watson-speech/speech-to-text/recognize-stream.js : initialize function is a good place, I added the following code:

  if (options.access_token && !options['accessToken']) {
    options['accessToken'] = options.access_token;
  }

and it resolved the issue, the wss request returned 101 success:

Answer 5 · 2021-07-27T07:54:43.000Z

Seems can not just change the access_token because I found another issue which has the same root cause:

The keywords_threshold was also skipped.

So the only workaround way is remove the camelcase() in the:

node_modules/watson-speech/util/process-user-parameters.js

module.exports = function processUserParameters(options, allowedParams) {
  var processedOptions = {};

  // look for the camelcase version of each parameter - that is what we expose to the user
  allowedParams.forEach(param => {
    // var keyName = camelcase(param);   -----
    keyName = param;   // ++++++
    if (options[keyName] !== undefined) {
      processedOptions[param] = options[keyName];
    }
  });

Answer 6 · 2021-07-27T08:31:43.000Z

Well, even with the above change, I still encounter other issue, which was caused by the above change:
Error: unable to transcode data stream application/octet-stream -> audio/l16

The root cause was that the content type header was skipped by the above code change because the option uses contentType, but in the openingMessageParamsAllowed, it uses the keywords_threshold.

Answer 7 · 2021-07-27T08:51:13.000Z

So now I made this change to change all the keys in the options to the camelcase at the very first of the initialize function:

RecognizeStream.prototype.initialize = function() {
  var options = {};
  var camelcase = require('camelcase');

  for (const [key, value] of Object.entries(this.options)) {
    var newKey = camelcase(key);
      options[newKey] = value;
  }
....

Now everything works great !!!!

Answer 8 · 2021-07-28T06:20:46.000Z

Seems don't need to change the sdk code, just need to update the demo’s code:

% git diff views/demo.jsx
diff --git a/views/demo.jsx b/views/demo.jsx
index 05569e8..a7bbe50 100644
--- a/views/demo.jsx
+++ b/views/demo.jsx
@@ -105,22 +105,22 @@ export class Demo extends Component {
     const keywords = this.getKeywordsArrUnique();
     return Object.assign({
       // formats phone numbers, currency, etc. (server-side)
-      access_token: this.state.accessToken,
+      accessToken: this.state.accessToken,
       token: this.state.token,
-      smart_formatting: true,
+      smartFormatting: true,
       format: true, // adds capitals, periods, and a few other things (client-side)
       model: this.state.model,
       objectMode: true,
-      interim_results: true,
+      interimResults: true,
       // note: in normal usage, you'd probably set this a bit higher
-      word_alternatives_threshold: 0.01,
+      wordAlternativesThreshold: 0.01,
       keywords,
-      keywords_threshold: keywords.length
+      keywordsThreshold: keywords.length
         ? 0.01
         : undefined, // note: in normal usage, you'd probably set this a bit higher
       timestamps: true, // set timestamps for each word - automatically turned on by speaker_labels
       // includes the speaker_labels in separate objects unless resultsBySpeaker is enabled
-      speaker_labels: this.state.speakerLabels,
+      speakerLabels: this.state.speakerLabels,
       // combines speaker_labels and results together into single objects,
       // making for easier transcript outputting
       resultsBySpeaker: this.state.speakerLabels,

Created a PR with the change: #266

Answer 9 · 2021-10-11T15:08:05.000Z

Closing as resolved from PR: #266