Speech recognition | Speech synthesis | Speaker verification | Speaker identification |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
Spoken Language identification | Audio tagging | Voice activity detection | Keyword spotting |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
Architecture | Android | iOS | Windows | macOS | linux |
---|---|---|---|---|---|
x64 | ✔️ | ✔️ | ✔️ | ✔️ | |
x86 | ✔️ | ✔️ | |||
arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
arm32 | ✔️ | ✔️ | |||
riscv64 | ✔️ |
C++ | C | Python | C# | Java | JavaScript | Kotlin | Swift | Go | Dart |
---|---|---|---|---|---|---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
It also supports WebAssembly.
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86,
x86_64
, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64) - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- NodeJS
- WebAssembly
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- etc
with the following APIs
- C++, C, Python, Go,
C#
- Java, Kotlin, JavaScript
- Swift
- Dart
Description | URL | **用户 |
---|---|---|
Streaming speech recognition | Address | 点此 |
Text-to-speech | Address | 点此 |
Voice activity detection (VAD) | Address | 点此 |
VAD + non-streaming speech recognition | Address | 点此 |
Two-pass speech recognition | Address | 点此 |
Audio tagging | Address | 点此 |
Audio tagging (WearOS) | Address | 点此 |
Speaker identification | Address | 点此 |
Spoken language identification | Address | 点此 |
Keyword spotting | Address | 点此 |
Description | URL | **用户 |
---|---|---|
Streaming speech recognition | Address | 点此 |
Description | URL |
---|---|
Speech recognition (speech to text, ASR) | Address |
Text-to-speech (TTS) | Address |
VAD | Address |
Keyword spotting | Address |
Audio tagging | Address |
Speaker identification (Speaker ID) | Address |
Spoken language identification (Language ID) | See multi-lingual Whisper ASR models from Speech recognition |
Punctuation | Address |
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.