/docker-marytts

MaryTTS text to speech server and a collection of voices for various languages

Primary LanguageShellMIT LicenseMIT

MaryTTS 5.2 with HSMM Voices

MaryTTS 5.2 text to speech server and a collection of hidden semi-Markov model (HSMM) voices for various languages in a multi-platform Docker image.

Also includes txt2wav utility for command-line text to speech.

Supported Platforms:

  • amd64 - laptops, desktops, servers
  • arm/v7 - Raspberry Pi 2/3
  • arm64 - Raspberry Pi 3+/4

Running

To run a MaryTTS server:

$ docker run -it -p 59125:59125 synesthesiam/marytts:5.2

You should now be able to access the server at http://localhost:59125

Beware that this may consume a lot of RAM on a Raspberry Pi!

Restricting Voices

You can control which voices are loaded with -v or --voice arguments:

$ docker run -it -p 59125:59125 synesthesiam/marytts:5.2 --voice cmu-slt-hsmm --voice cmu-rms-hsmm

This will only loaded the necessary JARs for the specified voices, which may help conserve RAM on a Raspberry Pi.

A list of voices can be obtained with:

$ docker run -it synesthesiam/marytts:5.2 --voices

Command-Line Utility

The txt2wav utility is included in the Docker image. A bash wrapper script allows you to list voices and includes only the necessary JARs to reduce start-up time.

Copy the included txt2wav script to somewhere in your $PATH and mark it executable. This script runs the Docker image as the current user, maps your $HOME directory, and sets the working directory to $PWD.

List available voices:

$ txt2wav

voice	language
...

Generate WAV file:

$ txt2wav --voice cmu-slt-hsmm -o /path/to/tts.wav 'Welcome to the world of speech synthesis!'

Play directly:

$ txt2wav 'Welcome to the world of speech synthesis!' | aplay

Online Mode

You can run txt2wav in an "online" mode where it will continuously read sentences from standard in, overwrite the output WAV file, and repeat the sentence back on standard out.

$ txt2wav --online -o /path/to/tts.wav

Reading sentences from stdin
...

Typing a sentence and pressing <ENTER> will overwrite /path/to/tts.wav and print the sentence back on standard out. To end the session, press CTRL+D.

Voices

Voice Language Gender License Samples
cmu-slt-hsmm English (en-US) Female Arctic Sample
cmu-bdl-hsmm English (en-US) Male Arctic Sample
cmu-rms-hsmm English (en-US) Male Arctic Sample
dfki-obadiah-hsmm English (en-GB) Male By-ND-3.0 Sample
dfki-poppy-hsmm English (en-GB) Female By-ND-3.0 Sample
dfki-prudence-hsmm English (en-GB) Female By-ND-3.0 Sample
dfki-spike-hsmm English (en-GB) Male By-ND-3.0 Sample
bits1-hsmm German (de) Female By-ND-3.0 Sample
bits3-hsmm German (de) Male By-ND-3.0 Sample
dfki-pavoque-neutral-hsmm German (de) Male By-ND-3.0 Sample
enst-camille-hsmm French (fr) Female By-SA-3.0 Sample
enst-dennys-hsmm French (fr) Male By-SA-3.0 Sample
upmc-jessica-hsmm French (fr) Female By-ND-3.0 Sample
upmc-pierre-hsmm French (fr) Male By-ND-3.0 Sample
istc-lucia-hsmm Italian (it) Female By-ND-3.0 Sample
ac-irina-hsmm Russian (ru) Female By-SA-3.0 Sample
stts-sv-hb-hsmm Swedish (sv) Male Unknown Sample
cmu-nk-hsmm Telugu (te) Female By-ND-3.0 Sample
dfki-ot-hsmm Turkish (tr) Male By-ND-3.0 Sample