/deepspeech-server

A testing server for a speech to text service based on mozilla deepspeech

Primary LanguagePythonMozilla Public License 2.0MPL-2.0

DeepSpeech Server

This is an http server that can be used to test the mozilla DeepSpeech project. You need an environment with DeepSpeech and a model to run this server.

Installation

You first need to install deepspeech. Depending on your system you can use the CPU package:

pip3 install deepspeech

Or the GPU package:

pip3 install deepspeech-gpu

Then you can install the deepspeech server:

python3 setup.py install   

The server is also available on pypi, so you can install it with pip:

pip3 install deepspeech-server

Note that python 3.5 is the minimum version required to run the server.

Starting the server

deepspeech-server --config config.json

You can use deepspeech without training a model yourself. Pre-trained models are provided by Mozilla in the release page of the project (See the download section at the bottom):

https://github.com/mozilla/DeepSpeech/releases

### Server configuration

The configuration is done with a json file, provided with the "--config" argument. Its structure is the following one:

{
  "deepspeech": {
    "model" :"model.pb",
    "alphabet": "alphabet.txt",
    "lm": "lm.binary",
    "trie": "trie"
  },
  "server": {
    "http": {
      "request_max_size": 1048576
    }
  }
}

The configuration file contains several sections and sub-sections.

Section "deepspeech" contains configuration of the deepspeech engine:

model is the protobuf model that was generated by deepspeech

alphabet is the alphabet dictionary (as available in the "data" directory of the DeepSpeech sources).

lm is the language model.

trie is the trie file.

Section "server" contains configuration of the access part, with on subsection per protocol:

http configuration:

request_max_size (default value: 1048576, i.e. 1MiB) is the maximum payload size allowed by the server. A received payload size above this threshold will return a "413: Request Entity Too Large" error.

host (default value: "0.0.0.0") is the listen address of the http server.

port (default value: 8080) is the listening port of the http server.

Using the server

Inference on the model is done via http post requests. For example with the following curl command:

 curl -X POST --data-binary @[myfile.wav] http://localhost:8000/stt