Tensorflow implementation of DeepMind's Tacotron-2 (without wavenet). A deep neural network architecture described in this paper: Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions
Tacotron-2
├── datasets
├── logs-Tacotron2 (2)
│ ├── eval-dir
│ │ ├── plots
│ │ └── wavs
│ ├── mel-spectrograms
│ ├── plots
│ ├── pretrained
│ ├── metas
│ └── wavs
├── papers
├─|
│ ├── models
│ └── utils
├── synth_output (3)
│ ├── eval
│ ├── gta
│ ├── logs-eval
│ │ ├── plots
│ │ └── wavs
│ └── natural
├── train_data (1)
│ ├── 1.npy
│ ├── 2.npy
│ ├── train.txt
The previous tree shows the current state of the repository (separate training, one step at a time).
- Step (0): Get your dataset and modify datasets/preprocessor.py -> "build_from_path" function.
- Step (1): Preprocess your data. This will give you the train_data folder.
- Step (2): Train your Tacotron model. Yields the logs-Tacotron2 folder.
- Step (3): Synthesize/Evaluate the Tacotron model. Gives the synth_output folder.
- Machine Setup:
First, you need to have python 3 installed along with Tensorflow.
Next, you need to install some Linux dependencies to ensure audio libraries work properly:
apt-get install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools
Finally, you can install the requirements. If you are an Anaconda user: (else replace pip with pip3 and python with python3)
pip install -r requirements.txt
Before proceeding, you must pick the hyperparameters that suit best your needs. While it is possible to change the hyper parameters from command line during preprocessing/training.
Before running the following steps, please make sure you are inside Tacotron-2 folder
cd Tacotron-2
Preprocessing can then be started using:
python preprocess.py
To train both models sequentially (one after the other):
python train.py
checkpoints will be made each 5000 steps and stored under logs-Tacotron2 folder.
To synthesize audio in an End-to-End (text to audio):
python synthesize.py
- Rayhane-mamah/Tacotron-2
- nii-yamagishilab/tacotron2
- Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
- Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions
- Original tacotron paper
- Attention-Based Models for Speech Recognition
- Wavenet: A generative model for raw audio
- Fast Wavenet
- r9y9/wavenet_vocoder
- keithito/tacotron