facebookresearch/svoice

Cross validation fails with error during training

vineet-joshi opened this issue · 0 comments

Hello, this implementation does (should do) exactly what I need for a project I am working on.

However, I could not get the older versions of the torch+cuda and numpy modules to work on the the NVIDIA L4 GPU I am using for the project. I upgraded the torch version to 1.13.1 and the GPU has CUDA 12.4 installed. I also had to upgrade numpy version to 1.21.6, without which I get the following error -

  File "train.py", line 120, in main
    _main(args)
  File "train.py", line 114, in _main
    run(args)
  File "train.py", line 32, in run
    from svoice.solver import Solver
  File "/home/vineet/svoice/svoice/solver.py", line 23, in <module>
    from .evaluate import evaluate
  File "/home/vineet/svoice/svoice/evaluate.py", line 16, in <module>
    from pesq import pesq
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/pesq/__init__.py", line 6, in <module>
    from .cypesq import cypesq
  File "pesq/cypesq.pyx", line 1, in init cypesq
ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it).

After updating these I was able to get the training script, train.py to start without interpreter errors, but the script fails during the cross validation step/process with the following error

[2024-06-01 16:03:45,776][__main__][INFO] - For logs, checkpoints and samples check /home/vineet/svoice/outputs/exp_
[2024-06-01 16:03:56,183][__main__][INFO] - Running on host training-l4-2-vcpus-24-ram-96-ubuntu
[2024-06-01 16:03:58,471][svoice.solver][DEBUG] - Checkpoint will be saved to /home/vineet/svoice/outputs/debug/model.th
[2024-06-01 16:03:58,472][svoice.solver][INFO] - ----------------------------------------------------------------------
[2024-06-01 16:03:58,472][svoice.solver][INFO] - Training...
[2024-06-01 16:03:59,818][svoice.solver][INFO] - Train | Epoch 1 | 3/15 | 3.5 it/sec | Loss 21.13142
[2024-06-01 16:04:00,384][svoice.solver][INFO] - Train | Epoch 1 | 6/15 | 4.1 it/sec | Loss 21.46726
[2024-06-01 16:04:00,954][svoice.solver][INFO] - Train | Epoch 1 | 9/15 | 4.4 it/sec | Loss 21.30898
[2024-06-01 16:04:01,521][svoice.solver][INFO] - Train | Epoch 1 | 12/15 | 4.6 it/sec | Loss 21.40352
[2024-06-01 16:04:02,067][svoice.solver][INFO] - Train | Epoch 1 | 15/15 | 4.7 it/sec | Loss 21.39990
[2024-06-01 16:04:02,070][svoice.solver][INFO] - Train Summary | End of Epoch 1 | Time 3.60s | Train Loss 21.39990
[2024-06-01 16:04:02,070][svoice.solver][INFO] - ----------------------------------------------------------------------
[2024-06-01 16:04:02,070][svoice.solver][INFO] - Cross validation...
[2024-06-01 16:04:02,330][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 120, in main
    _main(args)
  File "train.py", line 114, in _main
    run(args)
  File "train.py", line 95, in run
    solver.train()
  File "/home/vineet/svoice/svoice/solver.py", line 133, in train
    valid_loss = self._run_one_epoch(epoch, cross_valid=True)
  File "/home/vineet/svoice/svoice/solver.py", line 213, in _run_one_epoch
    estimate_source = self.dmodel(mixture)
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vineet/svoice/svoice/models/swave.py", line 256, in forward
    mixture_w = self.encoder(mixture)
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vineet/svoice/svoice/models/swave.py", line 284, in forward
    mixture_w = F.relu(self.conv(mixture))
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 310, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Calculated padded input size per channel: (0). Kernel size: (8). Kernel size can't be greater than actual input size

After doing some searching, it appears that this could be a function of the training input .wav files. However, I am trying to use the training dataset provided with the repo, so would have thought that would be something that worked out of the box.

If I skip the cross validation step by setting the cross_valid parameter to False in the solver.py script, the training progresses but I encounter errors in the SWave model's Encoder's forward() method wherein the Conv1d() function fails. Also, I tried upgrading to Python 3.12, with corresponding updates to the dependencies, but run into the same issues.

When I skip steps, such as cross validation or get around the Conv1d() issues by providing default or empty tensors, I was able to get the training and evaluation to run, but the output speaker files have a monotone, continuous beeping sound overlayed on the speaker's voice, which I assume is a result of not performing cross validation or the convolution functions().

Any help in this regard is much appreciated. If I can get this implementation working, it is an ideal fit for a social project I am working on. Please let me know if you need additional information. Thanks.