Question about the source of proposed dataset

Question

Question about the source of proposed dataset

XuecWu opened this issue 3 months ago · 4 comments

Thank you for your exciting work!
As described above, I want to know where is the source of the proposed dataset?
I have read the full paper carefully, but I did not find information about the source of the AV-Deepfake1M.

Looking forwards to your reply!
Best regards,

Answer 1 · 2024-04-16T13:01:02.000Z

Hi,

From Section 3.1,

The three-stage pipeline for generating content-driven deepfakes is illustrated in Figure 2. A subset of real videos from the Voxceleb2 [14] dataset is pre-processed to extract the audio using FFmpeg [47], followed by Whisper-based [41] real transcript generation.

Answer 2 · 2024-04-16T14:03:50.000Z

Hi,

From Section 3.1,

The three-stage pipeline for generating content-driven deepfakes is illustrated in Figure 2. A subset of real videos from the Voxceleb2 [14] dataset is pre-processed to extract the audio using FFmpeg [47], followed by Whisper-based [41] real transcript generation.

Got it.
Ccould you tell me the specific number of videos sampled from the Voxceleb2?
This is important to my work.
Thank you!

Answer 3 · 2024-04-16T14:11:29.000Z

All the real samples are from VoxCeleb2, i.e. 286721.

Answer 4 · 2024-04-17T01:38:52.000Z

Got it.
Thank you!