A basic object-oriented text-to-speech program using diphone speech synthesis.
Requires PyAudio, Numpy, and NLTK.
Note for Windows users: PyAudio must be installed via pipwin (i.e pip install pipwin
followed by pipwin install pyaudio
)
Important: unpack the diphones.7z archive such that the diphones folder is on the same level as the .py files!
Run diphone_synth.py in the command line with the following arguments:
- "Your phrase", as the text you want to synthesise. Default: None
- --play or -p, to play the generated waveform. Default: True
- --crossfade or -c, to enable crossfading (taper, overlap, add) of diphones for smoother-sounding output. Default: False
- --volume or -v, to specify the volume of the audio output (range 1 to 100). Default: 100
- --save or -s followed by filename.wav, to save the output to a new file, path relative. Default: None
- --help or -h to open the help menu with these instructions.
In theory, you can use a different diphone database. You will have to update the global variable SAMPLE_RATE to match that of your wav files. The filename conventions will also have to be the same.
Diphones courtesy of Alan W. Black and Kevin Lenzo.