This library is SpeechRecognition API (Web Speech API) compatible implementation for various cloud-based speech recognition engines.
I plan to support the following real-time speech recognition engines.
- MediaDevices.getUserMedia: To capture audio from a microphone
- WebAudio API, AudioWorklet: To process VAD(Voice activity detection) and encoding in other thread
- WebAssembly: To use libopus in web browser
- Opus: Low-latency, high quality, royalty-free audio codec
- SpeexDSP: Resampler
- WebSocket: To communicate in realtine between client and server
- FastAPI: Server-side Python async web framework
- Using Google python-speech with gRPC AsyncIO API
- Flushing Ogg pages every Opus packet for low latency (but more overheads for size)
- Using opus's integrated VAD implementation (with patch to export VAD probability)
- WebAssembly executes in AudioWorklet
WebAssembly file and opus patch file is under The 3-Clause BSD License.
All others is under the AGPL. If you want to other license, please contact me.
[CLIENT] [SERVER]
| |
| <init msg> |
+-------------->|
| <opus packet> |
+-------------->|
+-------------->|
+-------------->|
| <resp msg> |
|<--------------|
| <opus packet> |
+-------------->|
+-------------->|
+-------------->|
| <resp msg> |
|<--------------|
: :
resp msg
= result message
| done message
| error message
async
{
"pre_skip": <number>, # opus pre-skip value (ogg)
"version": <string>, # encoder version string (ogg)
"engine": <string>, # cloud ASR engine option (one of ["google-v1"])
"engine-config": <object>, # engine configuration
}
{
"is_final": <boolean>,
"alternatives": [{
"transcript": <string>,
"confidence": <float>,
}]
}
{
"type": "done"
}
{
"type": "error",
"error": <string>,
"message": <string>,
}