This repository provides the official Python implementation of Unmasking DeepFake with simple Features (Paper: https://arxiv.org/abs/1911.00686).
Overview of the pipeline used in our approach. It contains two main blocks, a pre-processing where the input istransformed to a more convenient domain and a training block, where a classifier uses the new transformed features to determine whether the face is real or not. Notice that input images are grey-scaled before DFT.
Tested on Python 3.6.x.
- NumPy (1.16.2)
- Opencv (4.0.0)
- Matplotlib (3.1.1)
To the best of our knowledge, no public dataset gathers images containing both artificially and real faces, therefore, we have created our own called Faces-HQ. In order to have a sufficient variety of faces, we have chosen to download and label, images available from CelebA-HQ dataset, Flickr-Faces-HQ dataset, 100K Facesproject and www.thispersondoesnotexist.com. In total, we have collected 40K high quality images being half of them real and the other half fake faces, achieving in this manner a balanced dataset.
Click here to go the experiments on Faces-HQ.
Faces-HQ dataset. Test accuracy using SVM, logistic regression and k-means classifier under different data settings.
CelebA CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations
Click here to go the experiments on CelebA.
FaceForensics++ is a forensics dataset consisting of video sequences that have been modified with different automated face manipulation methods. Additionally,it is hosting DeepFakeDetection Dataset. In particular, this dataset contains 363 original sequences from 28 paid actors in 16 different scenes as well as over 3000 manipulated videos using DeepFakes and their corresponding binary masks. All videos contain a trackable mostly frontal face without occlusions which enables automated tampering methods to generate realistic forgeries.
Click here to go the experiments on DeepFakeDetection.
DeepFakeDetection dataset.
Results based on frames.
Test accuracy using SVM and logistic regression classifier under different data settings.
Results based on videos. (We apply a simple majority vote over the single frame classifications).
Test accuracy using SVM and logistic regression classifier.
This repo uses and combines several datasets to form Faces-HQ:
We take 10K samples from CelebA-HQ dataset.
We take 10K samples from Flickr-Faces-HQ dataset and we convert to JPEG format.
We take 10K samples from www.thispersondoesnotexist.com uisng this script
We take 10K samples from 100K Facesproject.
Download full (19GB) Faces-HQ data set: https://cutt.ly/6enDLYG
If this work is useful for your research, please cite our paper:
@misc{durall2019unmasking,
title={Unmasking DeepFakes with simple Features},
author={Ricard Durall and Margret Keuper and Franz-Josef Pfreundt and Janis Keuper},
year={2019},
eprint={1911.00686},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Following this pre-print, we have a CVPR 2020 paper, looking into the theory of spectral distortions by GANs and a way to fix this.