/kaldi-serve

gRPC server component for Kaldi based ASR

Primary LanguageC++Apache License 2.0Apache-2.0

Kaldi-Serve

GitHub tag (latest by date) GitHub

gRPC server component for Kaldi based ASR.

Key Features:

  • Multithreaded gRPC server.
  • Supports bi-directional streaming recognition.
  • Thread-safe concurrent queue to process each audio stream separately.
  • N-best alternatives with LM and AM costs.
  • Word level timing and confidence scores.

Getting Started

Setup

Make sure you have gRPC, protobuf and Boost C++ libraries installed on your system. Kaldi also needs to be present and built. Let's build the server:

make KALDI_ROOT=/path/to/local/repo/for/kaldi/ -j8

Run make clean to clear old build files.

Running the server

For running the server, you need to first specify model config in a toml which tells the program which models to load, where to look for etc. Structure of model_spec_toml file is specified in a sample in resources.

# Make sure to have kaldi and openfst library available using LD_LIBRARY_PATH or something
# e.g. env LD_LIBRARY_PATH=../../asr/kaldi/tools/openfst/lib/:../../asr/kaldi/src/lib/ ./build/kaldi_serve_app

# Alternatively, you can also put all the required .so files in the ./lib/ directory since
# that is added to the binary's rpath.

./build/kaldi_serve_app --help

Kaldi gRPC server
Usage: ./build/kaldi_serve_app [OPTIONS] model_spec_toml

Positionals:
  model_spec_toml TEXT:FILE REQUIRED
                              Path to toml specifying models to load

Options:
  -h,--help                   Print this help message and exit
  -v,--version                Show program version and exit

Clients

For simple microphone testing, you can do something like the following (needs evans installed):

audio_bytes=$(arecord -f S16_LE -d 5 -r 8000 -c 1 | base64 -w0) # Recording 5 seconds of audio
echo "{\"audio\": {\"content\": \"$audio_bytes\"}, \"config\": {\"max_alternatives\": 2, \"model\": \"general\", \"language_code\": \"hi\"} }" | evans --package kaldi_serve --service KaldiServe ./protos/kaldi_serve.proto  --call Recognize --port 5016 | jq

The output structure looks like the following:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "हेलो दुनिया",
          "confidence": 0.95897794,
          "amScore": -374.5963,
          "lmScore": 131.33058
        },
        {
          "transcript": "हैलो दुनिया",
          "confidence": 0.95882875,
          "amScore": -372.76187,
          "lmScore": 131.84035
        }
      ]
    }
  ]
}

A Python client is also present in python directory with a few example scripts.

Load testing

We perform load testing using ghz which is a gRPC benchmarking and load testing tool. You can use the following command template:

ghz \
--insecure \
--proto ./protos/kaldi_serve.proto \
--call kaldi_serve.KaldiServe.StreamingRecognize \
-n [NUM REQUESTS] -c [CONCURRENT REQUESTS] \
--cpus [NUM CORES] \
-d "[{\"audio\": {\"content\": \"$chunk1\"}, \"config\": {\"max_alternatives\": [N_BEST], \"language_code\": \"[LANGUUAGE]\", \"model\": \"[MODEL]\"}}, ...more chunks]" \
0.0.0.0:5016