ControlNet/AV-Deepfake1M

Question about the source of proposed dataset

XuecWu opened this issue · 4 comments

Thank you for your exciting work!
As described above, I want to know where is the source of the proposed dataset?
I have read the full paper carefully, but I did not find information about the source of the AV-Deepfake1M.

Looking forwards to your reply!
Best regards,

Hi,

From Section 3.1,

The three-stage pipeline for generating content-driven deepfakes is illustrated in Figure 2. A subset of real videos from the Voxceleb2 [14] dataset is pre-processed to extract the audio using FFmpeg [47], followed by Whisper-based [41] real transcript generation.

Hi,

From Section 3.1,

The three-stage pipeline for generating content-driven deepfakes is illustrated in Figure 2. A subset of real videos from the Voxceleb2 [14] dataset is pre-processed to extract the audio using FFmpeg [47], followed by Whisper-based [41] real transcript generation.

Got it.
Ccould you tell me the specific number of videos sampled from the Voxceleb2?
This is important to my work.
Thank you!

All the real samples are from VoxCeleb2, i.e. 286721.

Got it.
Thank you!