kan-bayashi/ParallelWaveGAN

Train hifigan using 44.1K audio file

gandolfxu opened this issue · 1 comments

The log message and config file are shown in the following:

/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/losses/mel_loss.py:164: UserWarning: Using a target size (torch.Size([16, 80, 39])) that is different to the input size (torch.Size([16, 80, 45])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  mel_loss = F.l1_loss(mel_hat, mel)
Traceback (most recent call last):
  File "/opt/conda/bin/parallel-wavegan-train", line 33, in <module>
    sys.exit(load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-train')())
  File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 1083, in main
    trainer.run()
  File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 96, in run
    self._train_epoch()
  File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 302, in _train_epoch
    self._train_step(batch)
  File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 223, in _train_step
    mel_loss = self.criterion["mel"](y_, y)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/notebook/code/personal/ParallelWaveGAN/parallel_wavegan/losses/mel_loss.py", line 164, in forward
    mel_loss = F.l1_loss(mel_hat, mel)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2616, in l1_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 65, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (45) must match the size of tensor b (39) at non-singleton dimension 2
sampling_rate: 44100     # Sampling rate.
fft_size: 1024           # FFT size.
hop_size: 220            # Hop size.
win_length: 882          # Window length.
                         # If set to null, it will be the same as fft_size.
window: "hann"           # Window function.
num_mels: 80             # Number of mel basis.
fmin: 30                 # Minimum freq in mel basis calculation.
fmax: 22050               # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0   # Will be multiplied to all of waveform.
trim_silence: false      # Whether to trim the start and end of silence.
trim_threshold_in_db: 20 # Need to tune carefully if the recording is not good.
trim_frame_size: 1024    # Frame size in trimming.
trim_hop_size: 256       # Hop size in trimming.
format: "hdf5"           # Feature file format. "npy" or "hdf5" is supported.

Please make sure prod(upsample_scales) == hop_size and batch_max_steps % hop_size = 0.

upsample_scales: [8, 8, 2, 2] # Upsampling scales.


batch_max_steps: 8192 # Length of each audio in batch. Make sure dividable by hop_size.