SpeechRecognition Samples

This library is SpeechRecognition API (Web Speech API) compatible implementation for various cloud-based speech recognition engines.

I plan to support the following real-time speech recognition engines.

Using APIs / Libraries

MediaDevices.getUserMedia: To capture audio from a microphone
WebAudio API, AudioWorklet: To process VAD(Voice activity detection) and encoding in other thread
WebAssembly: To use libopus in web browser
Opus: Low-latency, high quality, royalty-free audio codec
SpeexDSP: Resampler
WebSocket: To communicate in realtine between client and server
FastAPI: Server-side Python async web framework

Features

Using Google python-speech with gRPC AsyncIO API
Flushing Ogg pages every Opus packet for low latency (but more overheads for size)
Using opus's integrated VAD implementation (with patch to export VAD probability)
WebAssembly executes in AudioWorklet

License

WebAssembly file and opus patch file is under The 3-Clause BSD License.

All others is under the AGPL. If you want to other license, please contact me.

Client <--> Server Protocol

[CLIENT]       [SERVER]
   |               |
   |  <init msg>   |
   +-------------->|
   | <opus packet> |
   +-------------->|
   +-------------->|
   +-------------->|
   |  <resp msg>   |
   |<--------------|
   | <opus packet> |
   +-------------->|
   +-------------->|
   +-------------->|
   |  <resp msg>   |
   |<--------------|
   :               :

resp msg = result message | done message | error message async

initialize message

{
   "pre_skip": <number>,  # opus pre-skip value (ogg)
   "version": <string>,  # encoder version string (ogg)

   "engine": <string>,  # cloud ASR engine option (one of ["google-v1"])
   "engine-config": <object>,  # engine configuration
}

result message

{
  "is_final": <boolean>,
  "alternatives": [{
    "transcript": <string>,
    "confidence": <float>,
  }]
}

done message

{
   "type": "done"
}

error message

{
  "type": "error",
  "error": <string>,
  "message": <string>,
}

kazuki/SpeechRecognition-Sample