The project consist a Video Deepfake detector based on hybrid EfficientNet CNN and Vision Transformer architecture. The model inference results can be analyzed and explained by rendering a (heatmap) visualization based on a Relevancy map calculated from the Attention layers of the Transformer, overlayed on the input face image.
In addition, the project enables to re-train and test the model performence and explainability with new parameters.
- Clone the repository and move into it:
git clone https://github.com/noame12/Explainable_Attention_Based_Deepfake_Detector.git
cd Explainable_Attention_Based_Deepfake_Detector
- Setup Python environment using Conda:
conda env create --file environment.yml
conda activate explain_deepfakes
export PYTHONPATH=.
System requirements: To run the explanability process on more than 5 face images, a machine with Tesla T4 (or stronger) GPU is required.
- Move to model explanation directory:
cd explain_model
- Create an input directory for the input face images:
mkdir examples
- Download the face images from the samples drive into the newly created local 'examples' directory.
The samples drive contains 600 samples of face image extractions from 600 test videos. It consists of 100 images for each of the five deepfake methods – Face2Face, FaceShift, FaceSwap, NeuralTextures and Deepfakes, as well as 100 untouched real (aka Original) face images.
An exhaustive list of the face image files for running the explainability method is provided in samples_list_All_efficientnetB0_checkpoint89_All_refac.csv file in the 'explain_model' directory. To run the test on a subset of the list, extract a customized list from the exhaustive list.
!Note: Make sure to keep the same .csv file name or update the name in the explain_model.py file (line 111) prior to running the explain_model.py module.
- Run the explanation visualization process:
python explain_model.py
The output of the explanation process can be viewed in the ‘explanation’ directory (created automatically)
The results of the explanability process run on all examples in advance can be seen in the visualization results drive .
The test module enables to test the performance of the deepfake detector. The input data to the model is the test (or verification) dataset of face images extracted from the fake and real video sequences. The test process generates four outputs:
- Accuracy, AUC (Area Under Curve) and F1 scores of the classifier
- ROC diagram
- A .txt file with the classification results for each video sequence
- A .csv list of face image files – one sample per each video.
System requirements: To run the test process, a machine with two Tesla T4 (or stronger) GPUs is required.
- Download and extract the dataset: FaceForensic++
The videos should be downloaded under '/deep_fakes_exaplain/dataset' directory.
To perform deepfake detection it is first necessary to identify and extract the faces from all the videos in the dataset.
- Detect the faces inside the videos:
cd preprocessing
python detect_faces.py --data_path /deep_fakes_exaplain/dataset --dataset: FACEFORENSICS
!Note: The default dataset for the detect_faces.py module is DFDC, therefore it is important to specify the --dataset parameter as described above.
The detected face boxes (coordinates) will be saved inside the "/deep_fakes_exaplain/dataset/boxes" folder.
- Extract the detected faces obtaining the images:
python extract_crops.py --data_path deep_fakes_explain/dataset --output_path deep_fakes_explain/dataset/training_set
--dataset FACEFORENSIC
Repeat detection and extraction for all the different parts of your dataset. The --output_path parameter above is set to the training_set directory. You should repeat the process also for the validation_set and test_set directories. The folders’ structure should look as follows:
Each (fake) method directory contain directories for all videos. Each video directory contain all face extraction files, for that video, in .png format.
- training_set
- Deepfakes
- video_name_0
0_0.png
1_0.png
...
N_0.png
...
- video_name_K
0_0.png
1_0.png
...
M_0.png
- Face2Face
- FaceShifter
- FaceSwap
- NeuralTextures
- Original
- validation_set
...
...
...
- test_set
...
...
- Move into the test module folder:
cd model_test_train
- Run the following command for evaluating the deepfake detector model providing the pre-trained model path and the configuration file available in the config directory:
python test_model.py --model_path ../deep_fakes_explain/models/efficientnetB0_checkpoint89_All --config configs/explained_architecture.yaml
By default, the command will test on All datasets but you can customize the following parameters:
- --dataset: Which dataset to use (Deepfakes|Face2Face|FaceShifter|FaceSwap|NeuralTextures|Original|All)
- --workers: Number of data loader workers (default: 16)
- --frames_per_video: Number of equidistant frames for each video (default: 20)
- --batch_size: Prediction Batch Size (default: 12)
The results of the test process are saved in the 'results/tests' directory.
The train module enables to re-train the model with different parameters. Re-training may be desired for verifying or testing any thesis for improving model performance or explainability.
To evaluate a customized model trained from scratch with a different architecture, you need to edit the configs/explained_architecture.yaml file.
System requirements: A machine with two Tesla T4 (or stronger) GPUs, CPU with 16 vCPUs and 100G RAM.
To train the model using my architecture configuration:
- Verify that you are in ‘model_test_train’ directory
- Run the train module
python train_model.py --config configs/explained_architecture.yaml
By default the command will train on All method datasets but you can customize the following parameters:
- --num_epochs: Number of training epochs (default: 100)
- --workers: Number of data loader workers (default: 16)
- --resume: Path to latest checkpoint (default: none)
- --dataset: Which dataset to use (Deepfakes|Face2Face|FaceShifter|FaceSwap|NeuralTextures|All) (default: All)
- --max_videos: Maximum number of videos to use for training (default: all)
- --patience: How many epochs wait before stopping for validation loss not improving (default: 5)
- The Deepfake Detector implementation is based on the Hybrid EfficientNet Vision Transformer implementation.
- The explainability method is based on the Transformer MM Explainability implementation.