/opentts

Open Text to Speech Server

Primary LanguagePythonMIT LicenseMIT

Open Text to Speech Server

Unifies access to multiple open source text to speech systems and voices for many languages, including:

  • eSpeak
    • Supports huge number of languages/locales, but sounds robotic
  • flite
    • English (19)
    • Hindi (1)
    • Bengali (1)
    • Gujarati (3)
    • Kannada (1)
    • Marathi (2)
    • Punjabi (1)
    • Tamil (1)
    • Telugu (3)
  • Festival
    • English (9), Spanish (1), Catalan (1), Czech (4)
  • nanoTTS
    • English (2), German (1), French (1), Italian (1), Spanish (1)
  • MaryTTS
    • English (7), German (3), French (4), Italian (1), Russian (1), Swedish (1), Telugu (1), Turkish (1)
    • External server required (Docker image)
    • Add --marytts-url command-line argument
  • Mozilla TTS
    • English (1)
    • External server required (Docker image, amd64 only)
    • Add --mozillatts-url command-line argument

Web interface screenshot

Running

Basic OpenTTS server:

$ docker run -it -p 5500:5500 synesthesiam/opentts

Visit http://localhost:5500

For HTTP API test page, visit http://localhost:5500/api/

Exclude eSpeak (robotic voices):

$ docker run -it -p 5500:5500 synesthesiam/opentts --no-espeak

Adding MaryTTS and Mozilla TTS

Run using docker compose with MaryTTS and Mozilla TTS:

version: '2'
services:
  opentts:
    image: synesthesiam/opentts
    ports:
      - 5500:5500
    command: --marytts-url http://marytts:59125 --mozillatts-url http://mozillatts:5002
    tty: true
  marytts:
    image: synesthesiam/marytts:5.2
    tty: true
  mozillatts:
    image: synesthesiam/mozilla-tts
    tty: true

Visit http://localhost:5500 and choose language en then voices starting with marytts: or `mozillatts:

NOTE: Mozilla TTS docker image only runs on amd64 platforms (no Raspberry Pi).

HTTP Endpoints

See swagger.yaml

  • GET /api/tts
    • ?voice - voice in the form tts:voice (e.g., espeak:en)
    • ?text - text to speak
    • Returns audio/wav bytes
  • GET /api/voices
    • Returns JSON object
    • Keys are voice ids in the form tts:voice
    • Values are objects with:
      • id - voice identifier for TTS system (string)
      • name - friendly name of voice (string)
      • gender - M or F (string)
      • language - 2-character language code (e.g., "en")
      • locale - lower-case locale code (e.g., "en-gb")
      • tts_name - name of text to speech system
    • Filter voices using query parameters:
      • ?tts_name - only text to speech system(s)
      • ?language - only language(s)
      • ?locale - only locale(s)
      • ?gender - only gender(s)
  • GET /api/languages
    • Returns JSON list of supported languages
    • Filter languages using query parameters:
      • ?tts_name - only text to speech system(s)

Voice Samples

See samples directory. eSpeak samples are not included since there are a lot of languages (and they all sound robotic).