jasongief/PSP_CVPR_2021

Some questions about the paper

laohuijiadezhu opened this issue · 1 comments

I also read the original paper of 'AVE'. I don't understand why you say them tries to automatically filter out unpaired samples. How to understand the meaning of 'filter out'.
图片

Hi, the meaning of this whole sentence is that some methods [5, 12] try to use the noise as supervision, while AVEL [28] aims to find out the (audio-visual) paired video samples. For AVE localization, the audio-visual pair depicting the same event (i.e., paired) can be utilized in a self-supervised manner.