andrewowens/multisensory

Questions about VoxCeleb2 dataset

Opened this issue · 1 comments

Hello, thanks for your great work!
I have been working on this model for a while but I haven't got results as good as reported in your paper. After checking videos in VoxCeleb2 dataset, I found some of them contained audible background noise and were of low quality, while clean reference speech segments are necessary to obtain SDR index.
I'm wondering whether you selected videos of high quality in training and test phase, and how?

Hello, do you know how to use the pre-trained source separation model to eval and test?