The Speech API is part of Cognitive Services. You can get free trial subscription keys from the Cognitive Services subscription page. After you select the Speech API, select Get API Key to get the key. It returns a primary and secondary key. Both keys are tied to the same quota, so you can use either key.
Note: Before you can use Speech client libraries, you must have a subscription key.
In this section we will walk you through the necessary steps to load a sample HTML page. The sample is located in our github repository. You can open the sample directly from the repository, or open the sample from a local copy of the repository.
Note: Some browsers block microphone access on un-secure origin. So, it is recommended to host the 'sample'/'your app' on https to get it working on all supported browsers.
Acquire a subscription key as described above. Then open the link to the sample. This will load the page into your default browser (Rendered using htmlPreview).
To try the sample locally, clone this repository:
git clone https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript
compile the TypeScript sources and bundle/browserfy them into a single JavaScript file (npm needs to be installed on your machine). Change into the root of the cloned repository and run the commands:
cd SpeechToText-WebSockets-Javascript && npm run bundle
Open samples\browser\Sample.html
in your favorite browser.
An npm package of the Microsoft Speech Javascript Websocket SDK is available. To install the npm package run
npm install microsoft-speech-browser-sdk
If you're building a node app and want to use the Speech SDK, all you need to do is add the following import statement:
import * as SDK from 'microsoft-speech-browser-sdk';
function RecognizerSetup(SDK, recognitionMode, language, format, subscriptionKey) {
let recognizerConfig = new SDK.RecognizerConfig(
new SDK.SpeechConfig(
new SDK.Context(
new SDK.OS(navigator.userAgent, "Browser", null),
new SDK.Device("SpeechSample", "SpeechSample", "1.0.00000"))),
recognitionMode, // SDK.RecognitionMode.Interactive (Options - Interactive/Conversation/Dictation)
language, // Supported languages are specific to each recognition mode Refer to docs.
format); // SDK.SpeechResultFormat.Simple (Options - Simple/Detailed)
// Alternatively use SDK.CognitiveTokenAuthentication(fetchCallback, fetchOnExpiryCallback) for token auth
let authentication = new SDK.CognitiveSubscriptionKeyAuthentication(subscriptionKey);
return SDK.Recognizer.Create(recognizerConfig, authentication);
}
function RecognizerStart(SDK, recognizer) {
recognizer.Recognize((event) => {
/*
Alternative syntax for typescript devs.
if (event instanceof SDK.RecognitionTriggeredEvent)
*/
switch (event.Name) {
case "RecognitionTriggeredEvent" :
UpdateStatus("Initializing");
break;
case "ListeningStartedEvent" :
UpdateStatus("Listening");
break;
case "RecognitionStartedEvent" :
UpdateStatus("Listening_Recognizing");
break;
case "SpeechStartDetectedEvent" :
UpdateStatus("Listening_DetectedSpeech_Recognizing");
console.log(JSON.stringify(event.Result)); // check console for other information in result
break;
case "SpeechHypothesisEvent" :
UpdateRecognizedHypothesis(event.Result.Text);
console.log(JSON.stringify(event.Result)); // check console for other information in result
break;
case "SpeechFragmentEvent" :
UpdateRecognizedHypothesis(event.Result.Text);
console.log(JSON.stringify(event.Result)); // check console for other information in result
break;
case "SpeechEndDetectedEvent" :
OnSpeechEndDetected();
UpdateStatus("Processing_Adding_Final_Touches");
console.log(JSON.stringify(event.Result)); // check console for other information in result
break;
case "SpeechSimplePhraseEvent" :
UpdateRecognizedPhrase(JSON.stringify(event.Result, null, 3));
break;
case "SpeechDetailedPhraseEvent" :
UpdateRecognizedPhrase(JSON.stringify(event.Result, null, 3));
break;
case "RecognitionEndedEvent" :
OnComplete();
UpdateStatus("Idle");
console.log(JSON.stringify(event)); // Debug information
break;
}
})
.On(() => {
// The request succeeded. Nothing to do here.
},
(error) => {
console.error(error);
});
}
function RecognizerStop(SDK, recognizer) {
// recognizer.AudioSource.Detach(audioNodeId) can be also used here. (audioNodeId is part of ListeningStartedEvent)
recognizer.AudioSource.TurnOff();
}
Currently, the TypeScript code in this SDK is compiled using the default module system (CommonJS), which means that the compilation produces a number of distinct JS source files. To make the SDK usable in a browser, it first needs to be "browserified" (all the javascript sources need to be glued together). Towards this end, this is what you need to do:
-
Add
require
statement to you web app source file, for instance (take a look at sample_app.js):var SDK = require('<path_to_speech_SDK>/Speech.Browser.Sdk.js');
-
Setup the recognizer, same as above.
-
Run your web-app through the webpack (see "bundle" task in gulpfile.js, to execute it, run
npm run bundle
). -
Add the generated bundle to your html page:
<script src="../../distrib/speech.sdk.bundle.js"></script>
...in progress, will be available soon
To use token-based authentication, please launch a local node server, as described here
The SDK is a reference implementation for the speech websocket protocol. Check the API reference and Websocket protocol reference for more details.
The SDK depends on WebRTC APIs to get access to the microphone and read the audio stream. Most of todays browsers(Edge/Chrome/Firefox) support this. For more details about supported browsers refer to navigator.getUserMedia#BrowserCompatibility
Note: The SDK currently depends on navigator.getUserMedia API. However this API is in process of being dropped as browsers are moving towards newer MediaDevices.getUserMedia instead. The SDK will add support to the newer API soon.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.