Custom dataset resulting in mel.shape[-1] * self.hop_len == audio.shape[-1]
C00reNUT opened this issue · 1 comments
C00reNUT commented
Hello when I try the code on LJSpeech dataset everything works fine and I am able to train the model, but when I try my own dataset in language different than English, I am able to generate latents without problem, but when I run train.py I get the following error... I am trying to fix but I have no idea why this is happening... the npy files have the same size as the ones in LJSpeech case...
Do I need to change something else to make it work for language different than English?
! Run is removed from outputs/run-October-02-2024_05+59PM-04202fc
Traceback (most recent call last):
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/trainer/trainer.py", line 1503, in train_epoch
for cur_step, batch in enumerate(self.train_loader):
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
data.reraise()
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/_utils.py", line 706, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/mnt/5bf09f3e-bdd1-4c93-99bb-72c72bfda11b/Train_Hifigan_XTTS/datasets/dataset_gan.py", line 90, in __getitem__
item1 = self.load_item(idx)
^^^^^^^^^^^^^^^^^^^
File "/mnt/5bf09f3e-bdd1-4c93-99bb-72c72bfda11b/Train_Hifigan_XTTS/datasets/dataset_gan.py", line 168, in load_item
mel.shape[-1] * self.hop_len == audio.shape[-1]
AssertionError: [!] 202752 vs 1115
C00reNUT commented
It was due to some stereo wav inputs in my dataset, after converting to mono it works