tuanh123789/Train_Hifigan_XTTS

Custom dataset resulting in mel.shape[-1] * self.hop_len == audio.shape[-1]

C00reNUT opened this issue · 1 comments

Hello when I try the code on LJSpeech dataset everything works fine and I am able to train the model, but when I try my own dataset in language different than English, I am able to generate latents without problem, but when I run train.py I get the following error... I am trying to fix but I have no idea why this is happening... the npy files have the same size as the ones in LJSpeech case...

Do I need to change something else to make it work for language different than English?

 ! Run is removed from outputs/run-October-02-2024_05+59PM-04202fc
Traceback (most recent call last):
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
    self._fit()
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
    self.train_epoch()
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/trainer/trainer.py", line 1503, in train_epoch
    for cur_step, batch in enumerate(self.train_loader):
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
    return self._process_data(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
    data.reraise()
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/_utils.py", line 706, in reraise
    raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
           ^^^^^^^^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/MAMBA_CACHE_DIR/envs/XTTS_HIFIGAN/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
            ~~~~~~~~~~~~^^^^^
  File "/mnt/5bf09f3e-bdd1-4c93-99bb-72c72bfda11b/Train_Hifigan_XTTS/datasets/dataset_gan.py", line 90, in __getitem__
    item1 = self.load_item(idx)
            ^^^^^^^^^^^^^^^^^^^
  File "/mnt/5bf09f3e-bdd1-4c93-99bb-72c72bfda11b/Train_Hifigan_XTTS/datasets/dataset_gan.py", line 168, in load_item
    mel.shape[-1] * self.hop_len == audio.shape[-1]
AssertionError:  [!] 202752 vs 1115

It was due to some stereo wav inputs in my dataset, after converting to mono it works