An implementation of modified version of the Tacotron speech synthesis model in TensorFlow to generate bird-chirps (audio) samples given an image of a bird.
-
Install Python 3.
-
Install the latest version of TensorFlow for your platform. For better performance, install with GPU support if it's available. This code works with TensorFlow 1.3 and later.
-
Install requirements:
pip install -r requirements.txt
Note: you need at least 40GB of free disk space to train a model.
-
Download dataset.
Use this link to download the dataset.
- Unzip the downloaded file.
- Setup the training data in the following structure:-
tacotron (project dir)
|- training
|- vgg19
| |- vgg19.npy
|- bird-00001.npy
|- bird-00002.npy
|- ...
|- chirp-mel-00001.npy
|- chirp-mel-00002.npy
|- ...
|- chirp-spec-00001.npy
|- chirp-spec-00002.npy
|- ...
|- train.txt
-
Train model
python3 train.py
-
Monitor with Tensorboard (optional)
tensorboard --logdir ~/tacotron/logs-tacotron
The trainer dumps audio and alignments every 1000 steps. You can find these in
~/tacotron/logs-tacotron
. -
Test your model
python3 test.py --checkpoint ~/tacotron/logs-tacotron/model.ckpt-185000 --image_path ~/path/to/input/image
- Set up the followinng directory structure given raw images and audio
├── raw_data
├── imgs # Folder contains all the images of the birds from 6 different breeds
│ ├── 0 # Duck
│ ├── 1 # Hawk
│ ├── 2 # Owl
│ ├── 3 # Seagull
│ └── 4 # Macaw
│ └── 5 # Rooster
└── wavs # Folder contains all the sounds of the birds from 6 different breeds
├── 0 # Duck
├── 1 # Hawk
├── 2 # Owl
├── 3 # Seagull
└── 4 # Macaw
└── 5 # Rooster
-
Data generation
-
Download pretrained VGG19 NPY to your project root directory.
-
Preprocess data and generate dataset.
python3 preprocess.py
-