OlaWod/FreeVC

Error while finetuning

Closed this issue · 10 comments

Hi, I am trying to finetune FreeVC-s model with a small dataset formatted in vctk format. I have done necessary changes in config file. I am also not using SR-based augmentation. But when I run train.py it throws me this error.

Traceback (most recent call last):
  File "train.py", line 284, in <module>
    main()
  File "train.py", line 49, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/FreeVC/train.py", line 115, in run
    train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
  File "/content/FreeVC/train.py", line 136, in train_and_evaluate
    for batch_idx, items in enumerate(train_loader):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 543, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
    return self.collate_fn(data)
  File "/content/FreeVC/data_utils.py", line 164, in __call__
    c_padded[i, :, :c.size(1)] = c
RuntimeError: The expanded size of the tensor (399) must match the existing size (487) at non-singleton dimension 1.  Target sizes: [1024, 399].  Tensor sizes: [1024, 487]

PS: In my dataset few speakers have wav files at 22050 sampling rate , while other speakers wav files are at 32000 sampling rate. But I have used downsample.py to ensure that all of them are sampled down to 16000.

seems that the wavlm feature length is not consistent with spectrogram length.
delete all '.spec.pt' files, make sure the wavs in the wav dir (named as 'DUMMY') are all 16kHz, and run again?

well ,I have .pt files in dataset/wavlm folders (not .spec.pt) which are generated using preprocess_ssl.py (I also confirmed if I am giving downsampled 16K wav folder in --in_dir argument). I'll try excluding folders with different sampling rate to see if that solves the issue

I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem

I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem

Thank you , it worked. @OlaWod does using older version of these files effect performance?

I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem

Thank you , it worked. @OlaWod does using older version of these files effect performance?

yes, it has more distortions according to my small scale test.

yes, it has more distortions according to my small scale test.

The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?

yes, it has more distortions according to my small scale test.

The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?

i think it more likely because of overfitting

yes, it has more distortions according to my small scale test.

The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?

#57 (comment)

Try this in release version.

Hey , after finetuning , though the voices are more accurate , the content is kind of messed up. But when I run the same conversion without fine tuning (on provided pre-trained models) the content is preserved while the voices are inaccurate.
Do you think more training (finetuning for little longer) would help?
My dataset contains sound files from 10 different speakers with 2hr duration , is it too less? If not , how many epochs of training is ideal for that?

Hello @MaN0bhiR , how did you end up improving the fine-tuned quality of converted voices? Are 2hrs duration of audios enough for fine-tuning?