WaveGrad
MindSpore implementation of WaveGrad, a diffusion based vocoder model for text-to-speech systems.
Demo
sample Transcript: "Be this as it may, the weapon used was only an ordinary axe, which rather indicates that force, not skill, was employed.")
sample Transcript: "This is a MindSpore implementation of the WaveGrad model, a diffusion based vocoder model for text to speech systems. Many thanks to Open I for computational resources!"):
Dependencies
pip install -r requirements.txt
- Install MindSpore.
- (Optional) Install
mpirun
for distributed training.
Generate from your data
From wav files:
python reverse.py --restore model_1000000.ckpt --wav LJ010-0142.wav --save results --device_target Ascend --device_id 0 --plot
From melspectrograms:
python reverse.py --restore model_1000000.ckpt --mel fs.npy --save results --device_target Ascend --device_id 0 --plot
Pretrained Models
Model | Dataset | Checkpoint | Total Batch Size | Num Frames | Num Mels | Hardware | MindSpore Version |
---|---|---|---|---|---|---|---|
WaveGrad (base) | LJSpeech-1.1 | 1M steps | 256 | 30 | 128 | 8 |
1.9.0 |
WaveGrad (base) | AiShell | TODO | 256 | 30 | 128 | 8 |
1.9.0 |
FastSpeech2 | LJSpeech-1.1 | TODO | 64 | - | 128 | 8 |
1.9.0 |
For FastSpeech2 model, we skipped the audio preprocess part and directly used this repo's preprocessed melspectrograms.
Train your own model
Step 0 (Data)
0.0
Download LJSpeech-1.1 to ./data/
.
0.1
Preprocess data to get a "_wav.npy" and "_feature.npy" for each ".wav" file in your dataset folder. Set your data_path
and
manifest_path
in base.yaml
. You can now run the following command:
python preprocess.py --device_target CPU --device_id 0
Step 1 (Train)
1.1 Train on local server
Set up device information:
export MY_DEVICE=Ascend # options: [Ascend, GPU]
export MY_DEVICE_NUM=8
Other training and model parameters can be set in base.yaml
.
Train on multiple cards: (each card will have a batch size of hparams.batch_size // MY_DEVICE_NUM)
nohup mpirun --allow-run-as-root -n $MY_DEVICE_NUM python train.py --device_target $MY_DEVICE --is_distributed True --context_mode graph > train_distributed.log &
Train on 1 card:
export MY_DEVICE_ID=0
nohup python train.py --device_target $MY_DEVICE --device_id $MY_DEVICE_ID --context_mode graph > train_single.log &
openi
1.2 Train on 8 Ascend cards onA quick guide on how to use openi:
- git clone a repo
- create a train task
- locally preprocess the data, zip it, and upload to your job's dataset
- set task options as follows:
start file:
train.py
Run Parameter:
is_openi
= True
is_distributed
= True
device_target
= Ascend
context_mode
= graph
Implementation details
The interpolation operator in both downsample and upsample blocks are replaced by a simple repeat operator, then divided by repeat factor.
Some additions in UBlock are divided by a constant
Acknowlegements
Some materials helpful for understanding diffusion models:
Some repositories that inspired this implementation:
Computational Resources:
License
GNU General Public License v2.0