/DNA-Det

This is the official repository for the code and datasets in the paper "Deepfake Network Architecture Attribution", AAAI 2022.

Primary LanguagePython

DNA-Det

Deepfake Network Architecture Attribution

Tianyun Yang, Ziyao Huang, Juan Cao, Lei Li, Xirong Li
AAAI 2022 Arxiv

  • [Mar 3, 2022] Note: The resource links are placeholders for now, we will update them and open source code soon!
  • [Mar 10, 2022] The source code, dataset and models are all released.

Abstract

With the rapid progress of generation technology, it has become necessary to attribute the origin of fake images. Existing works on fake image attribution perform multi-class classification on several Generative Adversarial Network (GAN) models and obtain high accuracies. While encouraging, these works are restricted to model-level attribution, only capable of handling images generated by seen models with a specific seed, loss and dataset, which is limited in real-world scenarios. In this work:

  • We present the first study on Deepfake Network Architecture Attribution to attribute fake images on architecture-level.
  • We develop a simple yet effective approach named DNA-Det to extract architecture traces, which adopts pre-training on image transformation classification and patchwise contrastive learning to capture globally consistent features that are invariant to semantics.
  • The evaluations on multiple cross-test setups and a large-scale dataset verify the effectiveness of DNA-Det. DNA-Det maintains a significantly higher accuracy than existing methods in cross-seed, cross-loss, cross-finetune and cross-dataset settings.

Prerequisites

  • Linux
  • NVIDIA GPU + CUDA 11.1
  • Python 3.7.10
  • pytorch 1.9.0

Datasets

You can download the dataset using the following links:

Option1: Baiduyun Link(Passwd:3cop)

Option2: Google Drive Links including part1 part2 part3 part4

Resolution Real, GAN Content Source
128x128 Real CelebA, LSUN-bedroom CelebA, LSUN
128x128 ProGAN CelebA, LSUN-bedroom GANFingerprint
128x128 MMDGAN CelebA, LSUN-bedroom GANFingerprint
128x128 SNGAN CelebA, LSUN-bedroom GANFingerprint
128x128 CramerGAN CelebA, LSUN-bedroom GANFingerprint
128x128 InfoMaxGAN CelebA, LSUN-bedroom mimicry
128x128 SSGAN CelebA, LSUN-bedroom mimicry
256x256 Real cat, airplane, boat, horse, sofa, cow, dog, train, bicycle, bottle, diningtable, motorbike, sheep, tvmonitor, bird, bus, chair, person, pottedplant, car CNNDetection
256x256 ProGAN cat, airplane, boat, horse, sofa, cow, dog, train, bicycle, bottle, diningtable, motorbike, sheep, tvmonitor, bird, bus, chair, person, pottedplant, car CNNDetection
256x256 StackGAN2 cat, church, bird, bedroom, dog StackGAN-v2
256x256 CycleGAN winter, orange apple, horse, summer, zebra CNNDetection
256x256 StyleGAN2 cat, church horse StyleGAN2
1024x1024 Real FFHQ, CeleA-HQ FFHQ, CeleA-HQ
1024x1024 StyleGAN FFHQ, celebA-HQ, Yellow, Model, Asian Star, kid, elder, adult, glass, male, female, smile StyleGAN, seeprettyface
1024x1024 StyleGAN2 FFHQ, Yellow, Wanghong, Asian Star, kid StyleGAN2, seeprettyface

Data preparation

  • Download dataset, and put it into the directory ./dataset.
  • Download the train part of CNNDetection dataset, and put it into the directory ./dataset/GANs.
  • Prepare annotation files for training, validation, closed-set testing and cross testings.
    • Prepare dataset for the celebA experiment.
    python generate_data_multiple_cross.py --mode celeba
    
    • Prepare dataset for the LSUN-bedroom experiment.
    python generate_data_multiple_cross.py --mode lsun
    
    • Prepare dataset for the in the wild experiment.
    python generate_data_in_the_wild.py --mode in_the_wild
    
    After generation, the folder should be like this:
    dataset
    ├── ${mode}_test
    │   └── annotations
    │       ├── ${mode}_test.txt
    │       ├── ${mode}_test_cross_seed.txt
    │       ├── ${mode}_test_cross_loss.txt
    │       ├── ${mode}_test_cross_finetune.txt
    │       └── ${mode}_test_cross_dataset.txt
    ├── ${mode}_train
    │   └── annotations
    │       └── ${mode}_train.txt
    └── ${mode}_val
        └── annotations
            └── ${mode}_val.txt
    
    where {mode}_train.txt, {mode}_val.txt, {mode}_test.txt, {mode}_test_cross_*.txt are the annotation files for training, validation, closed-set testing and cross testing spilts.
  • Quick implement: We provide generated txt files, they can be downloaded from Baiduyun(passwd:olci) or GoogleDrive. Put them into the directory ./dataset.

Pretrain on Image Transformations

sh ./script/run_train_pretrain.sh

After training, the model and logs are saved in ./dataset/pretrain_train/models/pretrain_val/pretrain/run_${run_id}/.

Training

  1. Specify training configurations in ./configs/${config_name}.py
  2. Specify settings including config_name, data_path and so on in ./script/run_train.sh and run:
sh ./script/run_train.sh
  • Following is an example for celebA experiment:
data_path=./dataset/
train_collection=celeba_train
val_collection=celeba_val
config_name=celeba
run_id=1
pretrain_model_path=./dataset/pretrain_train/models/model.pth
python main.py  --data_path $data_path --train_collection $train_collection --val_collection $val_collection \
--config_name $config_name --run_id $run_id \
--pretrain_model_path $pretrain_model_path 

where

  • data_path: The dataset path
  • train_collection: The training split directory
  • val_collection: The validation split directory
  • config_name: The config file
  • run_id: The running id for numbering this training
  • pretrain_model_path: The pretrain model on image transformation classification

Similarly, for the LSUN-bedroom experiment:

data_path=./dataset/
train_collection=lsun_train
val_collection=lsun_val
config_name=lsun
run_id=1
pretrain_model_path=./datasets/pretrain_train/models/pretrain_val/pretrain/run_0/model.pth
python main.py  --data_path $data_path --train_collection $train_collection --val_collection $val_collection \
--config_name $config_name --run_id $run_id \
--pretrain_model_path $pretrain_model_path 

For the in the wild experiment:

data_path=./dataset/
train_collection=in_the_wild_train
val_collection=in_the_wild_val
config_name=in_the_wild
run_id=1
pretrain_model_path=./datasets/pretrain_train/models/pretrain_val/pretrain/run_0/model.pth
python main.py  --data_path $data_path --train_collection $train_collection --val_collection $val_collection \
--config_name $config_name --run_id $run_id \
--pretrain_model_path $pretrain_model_path 
  1. After training, the models and logs are saved in ./${data_path}/${train_collection}$/models/${val_collection}/${config_name}/run_${run_id}/.

Pre-trained models

We provide pre-trained models: Baiduyun(passwd:olci) or GoogleDrive. They have been put into the right path along with the annotation files. See model_path in ./script/run_test.sh.

Inference

To evaluate the trained model on multiple cross-test setups. Specify settings in ./script/run_test.sh and run:

sh ./script/run_test.sh
  • Following is an example for the celebA experiment:
config_name=celeba
model_path=./dataset/celeba_train/models/celeba_val/celeba/run_0/model.pth
python3 pred_eval.py --model_path $model_path --config_name $config_name \
--test_data_paths \  
./dataset/celeba_test/annotations/celeba_test.txt \
./dataset/celeba_test/annotations/celeba_test_cross_seed.txt \
./dataset/celeba_test/annotations/celeba_test_cross_loss.txt \
./dataset/celeba_test/annotations/celeba_test_cross_finetune.txt \
./dataset/celeba_test/annotations/celeba_test_cross_dataset.txt  
  • Following is an example for the LSUN-bedroom experiment:
config_name=lsun
model_path=./dataset/lsun_train/models/lsun_val/lsun/run_0/model.pth
python3 pred_eval.py --model_path $model_path --config_name $config_name \
--test_data_paths \  
./dataset/lsun_test/annotations/lsun_test.txt \
./dataset/lsun_test/annotations/lsun_test_cross_seed.txt \
./dataset/lsun_test/annotations/lsun_test_cross_loss.txt \
./dataset/lsun_test/annotations/lsun_test_cross_finetune.txt \
./dataset/lsun_test/annotations/lsun_test_cross_dataset.txt  
  • Following is an example for the in the wild experiment:
config_name=in_the_wild
model_path=./dataset/in_the_wild_train/models/in_the_wild_val/in_the_wild/run_0/model.pth
python3 pred_eval.py --model_path $model_path --config_name $config_name \
--test_data_paths \  
./dataset/in_the_wild_test/annotations/in_the_wild_test.txt \
./dataset/in_the_wild_test/annotations/in_the_wild_test_cross_dataset.txt  

After running, the result will be saved in ./${test_data_path}/pred/result.txt

Visualization

To visualize feature space on cross data by t-SNE (Figure 1 in our paper),

sh ./script/do_tsne.sh

To visualize what regions the network focuses on by GradCAM (Figure 6 in our paper),

sh ./script/do_gradcam.sh

Citation

If you find our model/method/dataset useful, please cite our work

@inproceedings{yang2022deepfake,
  title={Deepfake Network Architecture Attribution},
  author={Yang, Tianyun and Huang, Ziyao and Cao, Juan and Li, Lei and Li, Xirong},
  booktitle={Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022)},
  year={2022}
}

Acknowledgement

This work was supported by the Project of Chinese Academy of Sciences (E141020), the Project of Institute of Computing Technology, Chinese Academy of Sciences (E161020), Zhejiang Provincial Key Research and Development Program of China (No. 2021C01164), and the National Natural Science Foundation of China (No. 62172420).