/constant-memory-waveglow

PyTorch implementation of NVIDIA WaveGlow with constant memory cost.

Primary LanguagePython

Constant Memory WaveGlow

A PyTorch implementation of WaveGlow: A Flow-based Generative Network for Speech Synthesis using constant memory method described in Training Glow with Constant Memory Cost.

The model implementation details are slightly differed from the official implementation based on personal favor, and the project structure is brought from pytorch-template.

Quick Start

Modify the data_dir in the json file to a directory which has a bunch of wave files with the same sampling rate, then your are good to go. The mel-spectrogram will be computed on the fly.

{
  "data_loader": {
    "type": "RandomWaveFileLoader",
    "args": {
      "data_dir": "/your/data/wave/files",
      "batch_size": 8,
      "num_workers": 2,
      "segment": 16000
    }
  }
}
python train.py -c config.json

Memory Usage Comparison

Coming soon.

Result

I trained the model on some cello music pieces from MusicNet using the musicnet_config.json. The clips in the samples folder is what I got. Although the audio quality is not very good, it's possible to use WaveGlow on music generation as well. The generation speed is around 470kHz on a 1080ti.