I created this interface for a voice-based-bot that I'm running on a Raspberry Pi 3B. I'm using the AIY Voice HAT, but I was very displeased with the robotic-voice that's supplied by Google. After studying a few other voice options I decided on IBM's Watson because of it's high quality cadence and intonation. I added some features for my purposes and decided others may find some benefit in my effort. The package should work in other internet-connected & sound-output-capable devices.
pip3 install watson-text-talker --user
Get IBM credentials for Watson Text-To-Speech, Lite Plan is FREE
from watson_text_talker import *
text_talker = TextTalker(api_key='your-watson-tts-api-key')
text_talker.say("Hello world!")
-
- lowers cloud round-trips
- keeps cost down
-
- segments phrases/sentences
- each segment can have it's own importance factor
-
- optional percentage chance that a phrase will be voiced
-
- optional quiet level factor can be applied to all optional phrases
- increases or decreases the likelihood that an optional phrase will be voiced
-
- very realistic sounding, with appropriate cadence and intonation
- voice selection: see here for selection available
- free tier plan: no credit card required, 10,000 characters per month at no cost
The package always caches new phrases to a file. The cache directory defaults to ./voice_mp3s
, but can also be defined in TT_Config. To regulate the filename I slugify the phrase. This has the advantage of making it human readable. The only caveat is the phrase MUST be limited to 255 characters.
Phrase grouping is based on array of tuples.
from watson_text_talker import *
text_talker = TextTalker(api_key='your-api-key')
importance = TT_Importance()
grouping_example = [(importance.SAY_30_PERCENT, "I'm your assistant."), (importance.SAY_50_PERCENT, "How are you?"), (importance.SAY_ALWAYS, "Nice to meet you") ]
text_talker.say_group(grouping_example)
Tuples are made up of the importance & the text phrase.
# TT_Importance is a class of numeric constants
SAY_ALWAYS = 1
SAY_90_PERCENT = 2
SAY_80_PERCENT = 3
SAY_70_PERCENT = 4
SAY_60_PERCENT = 5
SAY_50_PERCENT = 6
SAY_40_PERCENT = 7
SAY_30_PERCENT = 8
SAY_20_PERCENT = 9
SAY_10_PERCENT = 10
SAY_NEVER = 11
For the same as above we could have just as easily said:
from watson_text_talker import *
text_talker = TextTalker(api_key='your_api_key')
grouping_example = [(8, "I'm your assistant."), (6, "How are you?"), (1, "Nice to meet you") ]
text_talker.say_group(grouping_example)
The package includes a globally applied quite level
that increases or decreases the likelihood that an optional phrase will be voiced.
from watson_text_talker import *
text_talker = TextTalker(api_key='your_api_key')
importance = TT_Importance()
grouping_example = [(importance.SAY_30_PERCENT, "I'm your assistant."), (importance.SAY_ALWAYS, "Nice to meet you") ]
text_talker.quiet_level = +2
text_talker.say_group(grouping_example)
In the above example the I'm your assistant
phrase will only be said 10% of the time because of the +2 assigned to quiet level. The Nice to meet you
is not effected.
use the TT_Config class to override configuration defaults
# TT_Config's standard defaults
API_KEY='--watson tts credentials api-key goes here--'
# TTS API URL
API_URL='https://gateway-lon.watsonplatform.net/text-to-speech/api'
TTS_VOICE = 'en-US_AllisonVoice'
TTS_ACCEPT = 'audio/mp3'
CACHE_DIRECTORY = 'voice_mp3s'
# when True cache direcory will be relative to the current working directory
# if False then cache directory should be fully pathed
CACHE_DIRECTORY_IS_RELATIVE = True
VOICE_FILE_EXTENSION = 'mp3'
# some environments may require a delay if first speech is cut off
# generally 1, 2 or 3 seconds will work
INITIALIZATION_DELAY = 0
Use it like so:
from watson_text_talker import *
config = TT_Config()
config.API_KEY='your watson tts api-key'
config.API_URL='if you need to update API URL, update this'
config.TTS_Voice = 'en-US_MichaelVoice'
congig.CACHE_DIRECTORY = 'custom_cache'
congig.INITIALIZATION_DELAY = 1
text_talker = TextTalker(config=config)
text_talker.say("Hello world!")
- pygame is used to process the mp3 voice files
- python-slugify is used to create cache file names