PolyGlotFake Dataset

Overview

PolyGlotFake is a novel multilingual and multimodal deepfake dataset meticulously designed to address the challenges and demands of deepfake detection technologies. It consists of videos with manipulated audio and visual components across seven languages, employing advanced Text-to-Speech, voice cloning, and lip-sync technologies.

Download DataSet

Please fill out this form to request access to the PolyGlotFake Dataset. We will review your request and respond as soon as possible.

Google form

Quantitative Comparison

DataSet	Release Data	Manipulated Modality	Multilingual	Real video	Fake video	Total video	Manipulation Methods	Techniques Labeling	Attribute Labeling
UADFV	2018	V	No	49	49	98	1	No	No
TIMI	2018	V	No	320	640	960	2	No	No
FF++	2019	V	No	1,000	4,000	5,000	4	No	No
DFD	2019	V	No	360	3,068	3,431	5	No	No
DFDC	2020	A/V	No	23,654	104,500	128,154	8	No	No
DeeperForensics	2020	V	No	50,000	10,000	60,000	1	No	No
Celeb-DF	2020	V	No	590	5,639	6,229	1	No	No
FFIW	2020	V	No	10,000	10,000	20,000	1	No	No
KoDF	2021	V	No	62,166	175,776	237,942	5	No	No
FakeAVCeleb	2021	A/V	No	500	19,500	20,000	4	No	Yes
DF-Platter	2023	V	No	133,260	132,496	265,756	3	No	Yes
PolyGlotFake	2023	A/V	Yes	766	14,472	15,238	10	Yes	Yes

Dataset Details

Composition

Total Videos: 15,238
- Real Videos: 766
- Fake Videos: 14,472
Resolution: 1280x720
Average Video Duration: 11.79 seconds

Languages and Synthesis Methods Distribution

Language: English; French; Spanish; Russian; Chinese; Arabic; Japanese
Synthesis methods: Audio manipulation: Bark+FreeVC; MicroTTS+FreeVC; XTTS; Tacotron+FreeVC; Vall-E-X Video manipulation: VideoRetalking; Wav2Lip

Generation Pipeline

Deepfake Detection Benchmark

Evaluation Results and Comparisons

Type	Detector	Backbone	FakeAVCeleb	DFDC	PolyGlotFake
Naive	MesoNet	Designed	0.7332	0.5906	0.5672
Naive	MesoInception	Designed	0.7945	0.6344	0.5831
Naive	Xception	Xception	0.9169	0.6530	0.6052
Naive	EfficienNet-B4	EfficienNet	0.9023	0.6020	0.5769
Spatial	Capsule	Capsule	0.8663	0.6146	0.6068
Spatial	FFD	Xception	0.9285	0.6583	0.5960
Spatial	CORE	Xception	0.9345	0.6625	0.6220
Spatial	RECCE	Designed	0.9396	0.6884	0.6596
Spatial	DSP-FWA	Xception	0.9115	0.6929	0.6658
Frequency	F3Net	Xception	0.9416	0.6452	0.6439
Frequency	SRM	Xception	0.9043	0.6346	0.6143
Ensemble	XRes	Designed	0.9556	0.7042	0.6835

Visualization

Ethics Statement

Access to the dataset is restricted to academic institutions and is intended solely for research use. It complies with YouTube's fair use policy through its transformative, non-commercial use, by including only brief excerpts (approximately 20 seconds) from each YouTube video, and ensuring that these excerpts do not adversely affect the copyright owners' ability to earn revenue from their original content. Should any copyright owner feel their rights have been infringed, we are committed to promptly removing the contested material from our dataset.

Citation

@misc{hou2024polyglotfake,
      title={PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset}, 
      author={Yang Hou and Haitao Fu and Chuankai Chen and Zida Li and Haoyu Zhang and Jianjun Zhao},
      year={2024},
      eprint={2405.08838},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

tobuta/PolyGlotFake