Error when fine tuning using new dataset.

Question

Error when fine tuning using new dataset.

lethanhson9901 opened this issue 3 years ago · 2 comments

Hi, I'm new in ML. I trained about 100K step (your pre-trained is 800K) in HiFi-GAN vocoder and the sound is acceptable. Now I want to using different dataset to train model. Should I train new HiFi-GAN model or continue to train pre-trained model? I'm not sure. And when I choose options fine tune using vivos dataset:
%cd '/content/drive/MyDrive/vietTTS/hifi-gan'
!python3 train.py --fine_tuning True --config ../assets/hifigan/config.json --input_wavs_dir=data --input_training_file=train_files.txt --input_validation_file=val_files.txt

And I got this error:
checkpoints directory : cp_hifigan
Loading 'cp_hifigan/g_00105000'
Complete.
Loading 'cp_hifigan/do_00110000'
Complete.
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
2021-06-15 03:07:39.878886: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Epoch: 119
Traceback (most recent call last):
File "train.py", line 271, in
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 113, in train
for i, batch in enumerate(train_loader):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/My Drive/vietTTS/hifi-gan/meldataset.py", line 144, in getitem
os.path.join(self.base_mels_path, os.path.splitext(os.path.split(filename)[-1])[0] + '.npy'))
File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'ft_dataset/VIVOSSPK46_184.npy'

I think when I trained vocoder before, my own dataset does not include these new files. How can I fix it ?
And the second question is what happen if I continue to train my pre-trained vocoder model with different dataset.
Thanks!

Answer 1 · 2021-06-15T11:13:27.000Z

Hi, I'm new in ML. I trained about 100K step (your pre-trained is 800K) in HiFi-GAN vocoder and the sound is acceptable. Now I want to using different dataset to train model. Should I train new HiFi-GAN model or continue to train pre-trained model?

@Lethanhson9901, I would give the pretrained model a try and check the synthesized sound after a few training steps.

I'm not sure. And when I choose options fine tune using vivos dataset:
...
Loading 'cp_hifigan/g_00105000'
Complete.
Loading 'cp_hifigan/do_00110000'
...
FileNotFoundError: [Errno 2] No such file or directory: 'ft_dataset/VIVOSSPK46_184.npy'

You are using a train/val file list which includes file VIVOSSPK46_184. However, this clip is unaligned (failed to be aligned) by the Montreal Forced Aligner. A fix is to remove VIVOSSPK46_184 from the train/val file list (your train_files.txt and val_files.txt).

Answer 2 · 2021-06-15T14:57:30.000Z

Many thanks!