
A series of smaller projects connected to a theatre performance in june 2019 by @gnd

Primary LanguagePythonMIT LicenseMIT


Set of AI/ML tools for @gnd's theatre performances, mostly in NLP/Speech domain.

Remarks: Most of it is very hacky, not fault tolerant, badly architectured, and potentially not cross-platform. Use with caution.


  • Run make prereq-apt to install apt-dependencies
  • Run make env to setup python virtual environment / pip3 install -r requirements.txt to setup global

Setup STT:

  • Set authentication for Google Cloud (e.g. export GOOGLE_APPLICATION_CREDENTIALS=<path/to/credentials.json> or any other possible way).


  • All modules either consume / produce data / produce data based on input.
  • Producers (STT) are python generators that yield data
  • Consumers have Consume(data) method that can optionally return (producer based on input e.g. charnn)



  • Each module can be run as CLI application for testing.


  • Producer Orthograph(lang, input_device, only_final).
  • produce() -> generator that yields (transcript, result.is_final)
  • --lang <l>: sets STT language, defaults to to cs-CZ (en-US works too).
  • --only_final False: controlls whether non-final transcripts are yielded as well (pause for a few seconds to obtain final transcript).
  • --exit_key <k>: exit-keyword, defaults to ananas (only for CLI testing).


  • Consumer RnnGen(char_rnn_ckpt_dir, sample_length, remove_prime).
  • produce(input) -> generates and immediately returns text primed by input.
  • --char_rnn_ckpt_dir <d>: directory containing a pretrained char-rnn model, defaults to charrnn/save (which by default doesn't exist, so make sure you either provide it or specify a valid path).
  • --sample_length: length of generated sequence.
  • --remove_prime: controlls whether the input is supposed to be cut from the beginning of generated result.


  • Consumer DummyVoice(lang)
  • consume(input) -> TTS plays audio of input string.
  • --lang <l>: sets TTS language, defaults to to cs-CZ (en-US works too).
  • --exit_key <k>: exit-keyword, defaults to ananas (only for CLI testing).
  • Notes: for windows pip install pypiwin32; for linux just make (or see what it installs).


  • Consumer TacotronVoice()
  • consume(input) -> Tacotron-2 plays audio of input string.
  • --exit_key <k>: exit-keyword, defaults to ananas (only for CLI testing).
  • Some/most Tacotron-2 params work.
  • Prerequisites:
  • Notes: First TTS takes significantly longer to generate (pre-heating with empty string causes most models to generate hell-noise).



  • Producer SocketReader(host, port, sep).
  • produce() -> yields messages converted to UTF-8 string read from socket one by one, blocks until it gets a full message that ends by separator.
  • --host: host for socket server, keep empty.
  • --port: port for socket server.
  • --sep: separator of messages.
  • Notes: Acts as socket server.


  • Consumer SocketWriter(host, port, sep).
  • consume(msg) -> sends msg through socket as UTF-8 encoded string suffixed by separator.
  • --host: host of socket to connect to.
  • --port: port for socket server.
  • --sep: separator of messages.
  • Notes: Acts as socket client.



Atm, poc.py consists of a loop - you may speak, and what you say is transcribed through GC's Speech API into text, until a keyword is not recognized. At this point, the connection to GC is closed and the transcribed text (except the keyword) is forwarded to the running char-rnn language model. The generated text from the language model is spoken through a text-to-speech interface.

  • Args for stt.py, tts.py, rnnGen.py apply
  • --next_key <k>: move-on-keyword, defaults to figaro. If it's not set the RNN is immediately run on first final transcript.
  • --say_primed False: By default, the text spoken includes the primed text (the transcription) and the generated text. To say only what has been generated, set this to false.
  • Run python3 poc.py <args>

You may like to choose the keywords based on the language used.

Make sure, that the speech transcriptions are in a character set, which belongs to the char-rnn model's vocabulary.


  • PoC listening on socket (see SocketReader.py args) and sending messages to tts.py service (see its args).
  • Showcase of how to use modules & socketWrappers together.
