/RobustForensics

Solution to Deepfake Detection Challenge

Primary LanguagePythonMIT LicenseMIT

Deepfake Detection Challenge Solution: RobustForensics

Solution to Deepfake Detection Challenge

Environment

All the dependencies are listed in requirements.txt, which is generated by pipreqs.

# please check requirements.txt
pip install -r requirements.txt

We use Slurm to manage computing resources, and all training scripts use srun to spawn multiple processes for synchronized SGD.

Data Preprocessing

All training videos should be put in the folder DFDC-Kaggle. There should be 50 sub-folders named as dfdc_train_part_0, ..., dfdc_train_part_49, each of which contains videos from one part of the DFDC dataset. We use the command

python extract_frames.py

to extract frames from all videos, which will be stored in DFDC-Kaggle_image. Note that these frames, which have the same resolution as the original videos, take up a significant amount of disk space.

After frame extraction, we use an open-source face detector RetinaFace to detect faces in all the frames. Note that the detector is the same as the one we used in our inference pipeline (see inference folder). The detection results should be saved in the folder DFDC-Kaggle_Retinaface. As an example, for the frame stored at DFDC-Kaggle_image/dfdc_train_part_0/aaqaifqrwn/frame1.png, we will generate a text file at DFDC-Kaggle_Retinaface/dfdc_train_part_0/aaqaifqrwn/frame1.txt, and its content would be the numbers below.

6
766 238 215 317 0.99811953 
1530 925 136 133 0.3763613 
1805 990 43 57 0.07622631 
1278 916 140 131 0.0581847 
1490 959 113 110 0.033537783 
1826 978 63 77 0.022241158 

With extracted frames and detected face boxes, we perform simple IoU-based tracking and face size alignment to finally obtain the aligned faces that are used for training our models. The aligned faces will be put in a folder named DFDC-Kaggle_Alignedface. More specifically, we simply use the command below.

python save_aligned_faces.py

Feel free to use multi-processing techniques to speed up the preprocessing steps.

Training

There are six pre-trained models for initialization, and they should be downloaded and put in the pretrain folder prior to training.

After the steps above, we use the command below to train all models sequentially. Note that <distributed file path> should be an empty absolute path which will store shared files for initializing process groups. The final parameter <task name> is optional, and only specifies the job name in Slurm.

sh train.sh <slurm partition name> <distributed file path> <task name>

You may look into train.sh to see the specific configs that are used for training. Data lists that are used by training have been put in the folder DFDC-Kaggle_list. By default, image-based models require 8 GPUs, and video-based models require 16 GPUs. In our submitted solution, we used 7 image-based models and 4 video-based models.

Note that sometimes model training may stuck or stop unexpectedly due to some random issues. In this case, you may use image_based/recover.sh or video_based/recover.sh to resume training.

Inference

Please refer to a separate README.md in the folder inference.