Train hifigan using 44.1K audio file
gandolfxu opened this issue · 1 comments
gandolfxu commented
The log message and config file are shown in the following:
/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/losses/mel_loss.py:164: UserWarning: Using a target size (torch.Size([16, 80, 39])) that is different to the input size (torch.Size([16, 80, 45])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
mel_loss = F.l1_loss(mel_hat, mel)
Traceback (most recent call last):
File "/opt/conda/bin/parallel-wavegan-train", line 33, in <module>
sys.exit(load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-train')())
File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 1083, in main
trainer.run()
File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 96, in run
self._train_epoch()
File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 302, in _train_epoch
self._train_step(batch)
File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 223, in _train_step
mel_loss = self.criterion["mel"](y_, y)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/losses/mel_loss.py", line 164, in forward
mel_loss = F.l1_loss(mel_hat, mel)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2616, in l1_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 65, in broadcast_tensors
return _VF.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (45) must match the size of tensor b (39) at non-singleton dimension 2
sampling_rate: 44100 # Sampling rate.
fft_size: 1024 # FFT size.
hop_size: 220 # Hop size.
win_length: 882 # Window length.
# If set to null, it will be the same as fft_size.
window: "hann" # Window function.
num_mels: 80 # Number of mel basis.
fmin: 30 # Minimum freq in mel basis calculation.
fmax: 22050 # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0 # Will be multiplied to all of waveform.
trim_silence: false # Whether to trim the start and end of silence.
trim_threshold_in_db: 20 # Need to tune carefully if the recording is not good.
trim_frame_size: 1024 # Frame size in trimming.
trim_hop_size: 256 # Hop size in trimming.
format: "hdf5" # Feature file format. "npy" or "hdf5" is supported.
kan-bayashi commented
Please make sure prod(upsample_scales) == hop_size
and batch_max_steps % hop_size = 0
.