Service for easy access to speech recognition capabilities of Kaldi using REST interface. Simple deployment and usage in couple clicks with Docker containers. Currently supports Russian. Models for other languages may be easily added in case of need.
Install:
- Git
- Git LFS
- Docker
git clone https://github.com/mvshyvk/KaldiService.git
cd KaldiService
docker build -t kaldi_service:1.0 ./
docker run -it --rm -p 8080:8080 kaldi_service:1.0
Project files are put to /speech_recognition folder:
- service/KaldiService - Java web application that provides REST API to speech recognition capabilities of Kaldi
- service/Tests - Postman REST collection for testing KaldiService API
- service/tomcat - Tomcat default folder where will be put deployed KaldiService web application, it's log and temporary files
- openapi/KaldiService.yaml - Open API specification of KaldiService REST API
- recognition_task.py - Python script for recognition of a single audio file;
- /tools - set of tools for speech recognition:
- data_preparator.py - script for data preparation before speech recognition;
- recognizer.py - script for performing speech recognition;
- segmenter.py - script for speech segmentation;
- transcriptins_parser.py - script for results parsing;
- /model - set of files of speech recognition model;
- /examples - audio files with examples for testing purpose.
As an acoustic model is used speech recognition model provided by alphacep
http://alphacephei.com/kaldi/kaldi-ru-0.6.tar.gz
For using another acoustic model it should be placed to ./model folder replacing existing files.
Attention! Size of HCLG.fst file is more than 500 Mb so Git LFS must be installed first in order to clone git repo correctly.
Pull in SpeechRecognitionTest console client application: https://github.com/mvshyvk/SpeechRecognitionTest
git clone https://github.com/mvshyvk/SpeechRecognitionTest.git
Windows:
- install vcpkg package manager
- install cpprestsdk library using vcpkg
- open SpeechRecognitionTest folder in Visual Studio 2019
- setup path to CMake toolchain file in CMakeSettings.json (vcpkg/scripts/buildsystems/vcpkg.cmake)
- build project
Linux:
cd SpeechRecognitionTest
sudo apt-get install libcpprest-dev
cmake .
make
./speechRecognitionTest http://localhost:8080 example.mp3
Example of output:
Import /service/Tests/SpeachRecognizer Tests.postman_collection.json to postman
- Use "Add new task" endpoint for submitting audio files
Receive task id
- Use "Get service status" for retrieving service status information
- Use "Get task status" for getting speech recognition results
https://app.swaggerhub.com/apis-docs/mvshyvk/Kaldi_Speech_Recognition/0.9.0
- Implement oauth authentication
- Implement websocket server connection that allows clients to receive notification about task completion without need to poll /task/{taskId}/status endpoint.
- Implement support of multiple language models and possibility to switch between them
- Replace intermediate layer between Kaldi and Segmentator: PyKaldi recognition (recognizer.py; NnetLatticeFasterRecognizer) by VOSK library recognition.