eloimoliner/bwe_historical_recordings

Having Trouble in training:

sukidasss opened this issue · 13 comments

Hello, I'm trying to run your code.
I just ran train_bwe.sh, but faced this error.

Traceback (most recent call last):
  File "train_bwe.py", line 307, in main
    _main(args)
  File "train_bwe.py", line 301, in _main
    run(args)
  File "train_bwe.py", line 41, in run
    dataset_val=dataset_loader.ValDataset(args.dset.path_music_validation, None, args.fs,args.seg_len_s_val)
  File "/home/featurize/work/bwe_historical_recordings/utils/dataset_loader.py", line 458, in __init__
    self.segments_clean[i]=10.0**(scales[i]/10.0) *self.segments_clean[i]
UnboundLocalError: local variable 'i' referenced before assignment

Hi,

Thanks for pointing out this error. This was a minor issue concerning the validation that is now corrected. But I feel that you fell into this error because your "args.dset.path_music_validation" is probably wrong. Take into account that if you want to retrain the model, you will need to download some (music) training data.

If you just want to try it with your recordings, I encourage you to try the Colab notebook:
https://colab.research.google.com/github/eloimoliner/bwe_historical_recordings/blob/main/colab/demo.ipynb

Othewhise, I added a few instructions to test the trained models in local.
I will update the readme with more information soon.
I hope this helps.

Hi,

Thanks for pointing out this error. This was a minor issue concerning the validation that is now corrected. But I feel that you fell into this error because your "args.dset.path_music_validation" is probably wrong. Take into account that if you want to retrain the model, you will need to download some (music) training data.

If you just want to try it with your recordings, I encourage you to try the Colab notebook: https://colab.research.google.com/github/eloimoliner/bwe_historical_recordings/blob/main/colab/demo.ipynb

Othewhise, I added a few instructions to test the trained models in local. I will update the readme with more information soon. I hope this helps.

Thank you for quick reply! This is fantastic work! I am attempting to retrain the model. Where can i download the dataset you are using?

Hi,

For the BWE, I used this dataset:
https://zenodo.org/record/5120004#.YlaVT6IzbmE

But I separated the piano and string pieces.

Is the code ready for training?
If yes, where should we enter the path to training set?

Hello,

The code works, although I have not documented it for training yet.

I'm using hydra for all the parameters and paths. You will need to add the paths in the dataset configuration file, such as "conf/dset/pianos.yaml". You can edit this one, or create your own and refer it in the "conf/conf.yaml", changing the dset field.

Please, feel fre to let me know if you have any problem at making this work in your environment. This will help me when writting the documentation.

Traceback (most recent call last):
  File "train_bwe.py", line 307, in main
    _main(args)
  File "train_bwe.py", line 301, in _main
    run(args)
  File "train_bwe.py", line 41, in run
    dataset_val=dataset_loader.ValDataset(args.dset.path_music_validation, None, args.fs,args.seg_len_s_val)
  File "/home/featurize/work/bwe_historical_recordings/utils/dataset_loader.py", line 458, in __init__
    self.segments_clean=10.0**(scales/10.0) *self.segments_clean
ValueError: operands could not be broadcast together with shapes (121,) (121,44100) 

The shapes of "segments_clean" and “scales” are different.

Apologies, I have corrected that, see the last commit

For this question eloimoliner/denoising-historical-recordings#5, I guess the answer is also similar for this project?

Hi,

If you want to train the denoiser (using train_denoiser.py) it is the same procedure but in pytorch. For the bwe, I simulated the bandwidth limitation using lowpass filters. So, there is only one dataset of clean broadband music to use. Then, the trained model can generalize to historical recordings. You can also tweak the parameters of the lowpass filters to adapt the training to your final goal. I recommend you to take a look at the preprint of the paper if you want to retrain the model https://arxiv.org/abs/2204.06478

Thanks, I managed to start training mechanically.
Just trying out train_bwe.py.
I see that the training is slow: 1.36 it/s on a V100. Is it expected?

Sorry, I didn't read the paper fully - I thought I understood it (2 weeks ago) by looking at the figures :-)
Anyway, I will take a print out of the paper now.

BTW: what's the motivation for using complex STFT domain? Why not time-domain?

Hi,

The training speed you obtained is normal. It is quite slow because of the filtering part and the data pipeline, which I did not really optimize for speed. Also the size of the spectrograms is quite large.

The motivation for using the STFT is basically empirical evidence. The idea is to use computer vision-alike models to capture 2d features in the spectrogram instead of directly in the time domain. This usually works better than waveform-based models, although it also has its limitations. At least, for this particular task, we obtained better results using the STFT, although we tried with different time-domain models.

Thanks!
This is getting interesting: "This usually works better than waveform-based models, although it also has its limitations."

May I ask what pros and cons you observed with complex STFTs? Did you try waveform-domain directly for the same task?

Hi,

The problem with waveform-domain convolution-based models (which are probably the state of the art at the moment in related tasks) is that they are sometimes unable to capture long-range dependencies due to their limited receptive field. This issue is often tackled by using some kind of multi-scale architectures or dilated convolutions, but with relative success. I indeed tried some waveform-domain models as the generator for this task, but the STFT model worked significatly better.
As I said, this problem is considerably relieved by using the spectrogram and applying techniques inherited from the image processing world (where 2D convolutional models are highly succesful). This many times works, but applying local convolutions to the spectrogram seems sub-optimal as the inherent dependecies in the frequency domain are global. Studying more suitable architectures to better capture the information in spectrograms is something I'm working on at the moment.