Data generation pipeline

Question

Data generation pipeline

csteinmetz1 opened this issue 2 years ago · 2 comments

After experiments with guitars we will want to develop a dataloader for on the fly data generation by applying audio effects to clean audio. We can use the effect implementation we developed for the SDX Challenge, but again focus only on compression, reverberation, and distortion as a start. There is an open question of which dataset we should use to source clean audio.

Vocals: VocalSet, VCTK (speech dataset)
Drums: ENST-drums dataset
Other: GuitarSet, IDMT-PIANO-MM Dataset
Bass: IDMT-SMT-Bass

Answer 1 · 2023-03-07T22:08:34.000Z

Can we adjust the dataset so that the we can make multiple passes over the audio files in train/valid/test splits if we want to make the dataset bigger by applying more random effects. I am concerned that the current dataset size is a bit small (~3000 examples for train set).

Answer 2 · 2023-03-07T22:28:14.000Z

Sure, will do