/text2speech_coqui_ai

My implementation of Coqui_AI TTS synthesizer

Primary Languagesed

Overview

  1. Run the Bash Script text2speech.sh which will allow you to choose a .txt file to convert and output a .wav

Setup

  1. This only works on linux or on WSL2 for windows (Linux for Windows)
  2. Create a Conda environment using the Mamba package (much faster) and the yaml file I made. Depending on your GPU and machine there can be compatiblity issues with the pytorch version and the cudatoolkit, and NVIDIA drivers. This should work for most newer gaming laptops.

mamba env create -n tts
mamba env update -n tts -f txt2wav_env.yaml

Pronounciation Updates

There are 3 files which should be constantly updated as you come across words with strange pronounciations, new abbreviations which the model will not be aware of, or rare combinations of vowels which need tweaking. These files use the 'sed' cli format which is a form of REGEX commands.

fonetix.sed
abbreviations.sed
letters.sed

Be aware of the difference between

s/ear/eeerr/g - For example this would change the word linear to lineerrr
s/\<ear\>/eeerr/g - This format would only change the word ear itself

Input

  1. Create a .txt file via any method. I used a Forbes articles about GPU's as an example because it's filled with acronyms, abbreviations, dates, and other difficult words. It's amazing how well it sounds even without the prounciated updates and tweaks.