ayushgupta9198/Flowtron_module

Conversion of text to audio with the method of Mel-spectrogram

Jupyter NotebookApache-2.0

Flowtron: an Autoregressive Flow-based Network for Text-to-Mel-spectrogram Synthesis

Pre-requisites

NVIDIA GPU + CUDA cuDNN

Setup

Clone this repo: git clone https://github.com/NVIDIA/flowtron.git
CD into this repo: cd flowtron
Initialize submodule: git submodule update --init; cd tacotron2; git submodule update --init
Install [PyTorch]
Install python requirements or build docker image
- Install python requirements: pip install -r requirements.txt

Training

Update the filelists inside the filelists folder to point to your data
python train.py -c config.json -p train_config.output_directory=outdir
(OPTIONAL) tensorboard --logdir=outdir/logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence. Dataset dependent layers can be [ignored]

Download our published [Flowtron LJS] or [Flowtron LibriTTS] model
python train.py -c config.json -p train_config.ignore_layers=["speaker_embedding.weight"] train_config.checkpoint_path="models/flowtron_ljs.pt"

Multi-GPU (distributed) and Automatic Mixed Precision Training ([AMP])

python -m torch.distributed.launch --use_env --nproc_per_node=NUM_GPUS_YOU_HAVE train.py -c config.json -p train_config.output_directory=outdir train_config.fp16=true

Inference demo

python inference.py -c config.json -f models/flowtron_ljs.pt -w models/waveglow_256channels_v4.pt -t -i 0