/SpeechRecognition-Sample

This library is SpeechRecognition API (Web Speech API) compatible implementation for various cloud-based speech recognition engines.

Primary LanguageTypeScriptGNU Affero General Public License v3.0AGPL-3.0

SpeechRecognition Samples

This library is SpeechRecognition API (Web Speech API) compatible implementation for various cloud-based speech recognition engines.

I plan to support the following real-time speech recognition engines.

Using APIs / Libraries

Features

  • Using Google python-speech with gRPC AsyncIO API
  • Flushing Ogg pages every Opus packet for low latency (but more overheads for size)
  • Using opus's integrated VAD implementation (with patch to export VAD probability)
  • WebAssembly executes in AudioWorklet

License

WebAssembly file and opus patch file is under The 3-Clause BSD License.

All others is under the AGPL. If you want to other license, please contact me.

Client <--> Server Protocol

[CLIENT]       [SERVER]
   |               |
   |  <init msg>   |
   +-------------->|
   | <opus packet> |
   +-------------->|
   +-------------->|
   +-------------->|
   |  <resp msg>   |
   |<--------------|
   | <opus packet> |
   +-------------->|
   +-------------->|
   +-------------->|
   |  <resp msg>   |
   |<--------------|
   :               :

resp msg = result message | done message | error message async

initialize message

{
   "pre_skip": <number>,  # opus pre-skip value (ogg)
   "version": <string>,  # encoder version string (ogg)

   "engine": <string>,  # cloud ASR engine option (one of ["google-v1"])
   "engine-config": <object>,  # engine configuration
}

result message

{
  "is_final": <boolean>,
  "alternatives": [{
    "transcript": <string>,
    "confidence": <float>,
  }]
}

done message

{
   "type": "done"
}

error message

{
  "type": "error",
  "error": <string>,
  "message": <string>,
}