Error while finetuning
Closed this issue · 10 comments
Hi, I am trying to finetune FreeVC-s model with a small dataset formatted in vctk format. I have done necessary changes in config file. I am also not using SR-based augmentation. But when I run train.py it throws me this error.
Traceback (most recent call last):
File "train.py", line 284, in <module>
main()
File "train.py", line 49, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/FreeVC/train.py", line 115, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/FreeVC/train.py", line 136, in train_and_evaluate
for batch_idx, items in enumerate(train_loader):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
data.reraise()
File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/content/FreeVC/data_utils.py", line 164, in __call__
c_padded[i, :, :c.size(1)] = c
RuntimeError: The expanded size of the tensor (399) must match the existing size (487) at non-singleton dimension 1. Target sizes: [1024, 399]. Tensor sizes: [1024, 487]
PS: In my dataset few speakers have wav files at 22050 sampling rate , while other speakers wav files are at 32000 sampling rate. But I have used downsample.py to ensure that all of them are sampled down to 16000.
seems that the wavlm feature length is not consistent with spectrogram length.
delete all '.spec.pt' files, make sure the wavs in the wav dir (named as 'DUMMY') are all 16kHz, and run again?
well ,I have .pt files in dataset/wavlm folders (not .spec.pt) which are generated using preprocess_ssl.py (I also confirmed if I am giving downsampled 16K wav folder in --in_dir argument). I'll try excluding folders with different sampling rate to see if that solves the issue
I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem
I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem
Thank you , it worked. @OlaWod does using older version of these files effect performance?
I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem
Thank you , it worked. @OlaWod does using older version of these files effect performance?
yes, it has more distortions according to my small scale test.
yes, it has more distortions according to my small scale test.
The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?
yes, it has more distortions according to my small scale test.
The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?
i think it more likely because of overfitting
yes, it has more distortions according to my small scale test.
The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?
Try this in release version.
Hey , after finetuning , though the voices are more accurate , the content is kind of messed up. But when I run the same conversion without fine tuning (on provided pre-trained models) the content is preserved while the voices are inaccurate.
Do you think more training (finetuning for little longer) would help?
My dataset contains sound files from 10 different speakers with 2hr duration , is it too less? If not , how many epochs of training is ideal for that?