kaggle-dfdc: A Python repository from pinglmlcv

This is the code of Team \WM/ to reproduce our solution for the Deepfake Detection Challenge (DFDC).

Please refer to Model_Summary.pdf for a descriptive summary of our method.

Members (alphabetical order):

Content

make_dataset.sh and make_dataset.py: Script to extract faces from videos. For dataset processing.
train-wsdan.py: Script to train WS-DAN models.
train-xception.py: Script to train the Xception model.
best-submission.py: Our best submission on Kaggle.

Environment

We trained models on our lab's Linux cluster. The environment listed below reflects a typical software / hardware configuration in this cluster.

Hardware:

CPU: Xeon Gold 5120
GPU: 2080Ti or 1080Ti
Mem: > 64GB
Data is stored in SSD.

Software:

System: Ubuntu 16.04.6 with Linux 4.4.0 kernel.
Python: 3.6 or 3.7 distributed by Anaconda.
CUDA: 10.0

Reproduce Guide

Code & Data Dependency

For dependent python packages, please refer to requirements.txt.
Other external code dependencies are provided as git submodules in the external/ folder.
- Run git submodule init && git submodule update to fetch these dependencies.
Other data dependencies can be downloaded from Google Drive:
- RetinaFace-Resnet50-fixed.pth: Pretrained RetinaFace model.
- ckpt_x.pth: Pretrained weight files for WS-DAN w/ Xception.
- ckpt_e.pth: Pretrained weight files for WS-DAN w/ EfficientNet-b3.
- xception-hg-2.pth: Pretrained Xception weight files.
External data used by the code which is not generated by us:
- Pretrained RetinaFace model [1].
- Pretrained EfficientNet on ImageNet [2].
- They are accessible publicly and have been posted in the External Data Disclosure Thread by multiple other users according to the competition rules.

Dataset Processing

The script make_dataset.py extracts aligned faces from a video file and save as images. It works like this:

$ mkdir /path/to/output_frames/
$ python make_dataset.py /path/to/video.mp4 /path/to/output_frames/

The script make_dataset.sh finds all mp4 files recursively in a directory and calls make_dataset.py on each mp4 file.

Supposing that you have downloaded DFDC datasets and extracted all zip files to videos/, run the following command to process the whole dataset and save face images to /mnt/ssd0/dfdc/ for training:

$ bash make_dataset.sh videos/ /mnt/ssd0/dfdc/

Training: WS-DAN

WS-DAN is the core part of our final solution. We have trained two variants of WS-DAN: one with Xception and another with EfficientNet-b3 as feature extractors.

Training configs for both variants are provided in wsdan-conf/ folder. You should check save_dir, datapath and pretrained settings before training.

Note the pretrained setting:

For WS-DAN w/Xception, we used our previously trained Xception model (see the next section) to initialize feature extractor. This should be the path to Xception weight files.
For WS-DAN w/EfficientNet-b3, we used weight files downloaded by the EfficientNet-PyTorch code which is pretrained on ImageNet. Set it with any non-empty string is fine.

To train WS-DAN w/Xception:

$ python train-wsdan.py wsdan-conf/xception.py

To train WS-DAN w/EfficientNet:

$ python train-wsdan.py wsdan-conf/efb3.py

Time estimation: We trained WS-DAN w/ Xception with 6 GPUs for almost a week (50 epochs).

Training: Xception

Xception part is not of much interest. We had been using two-class Xception as per-face classifier before we found powerful WS-DAN.

Check xception-conf.py for various path settings. Then run:

$ python train-xception.py xception-conf.py

Time estimation: With 4 GPUs, 92% or more validation accuracy should be observed in around 12h. We typically trained for more than 1 day (20+ epochs).

Validation

submission.py is the code of our best submission on Kaggle. We got 0.42842 (private) and 0.28680 (public). Please modify paths accordingly for your testing purpose.

Summary

Following are a summary of command-lines to train our model:

# Dataset Processing
$ ls dfdc_train_part_*.zip | xargs -i unzip {} -d videos/
$ bash make_dataset.sh videos/ /mnt/ssd0/dfdc/

# Train Xception
$ python train-xception.py xception-conf.py

# Train WS-DAN
$ mkdir -p output/dfdc-wsdan-{xception,efb3}/
$ python train-wsdan.py wsdan-conf/xception.py
$ python train-wsdan.py wsdan-conf/efb3.py

Reference

[1] RetinaFace implementation: biubug6/Pytorch_Retinaface

[2] WS-DAN implementation: GuYuc/WS-DAN.PyTorch.

[3] EfficientNet implementation: lukemelas/EfficientNet-PyTorch.

[4] Face alignment code is from: deepinsight/insightface.

pinglmlcv/kaggle-dfdc