Implementation code for our paper. link to paper | link to MS Thesis | link to MS Thesis defense PPT file | link to CViT2
- Pytorch >=1.4
- helpers_read_video_1.py
- helpers_face_extract_1.py
- blazeface.py
- blazeface.pth
- face_recognition
- facenet-pytorch
- dlib
extractfaces.py Face extraction from video. The code works for DFDC dataset. You can test it using the sample data provided.
cvit_deepfake_detection_ep_50.pth - Model weight for CViT.
cvit2_deepfake_detection_ep_50.pth - Model weight for CViT2.
Download the pretrained model from Huggingface and save it in the weight
folder.
wget https://huggingface.co/datasets/Deressa/cvit/blob/main/cvit2_deepfake_detection_ep_50.pth
or
wget https://huggingface.co/datasets/Deressa/cvit/blob/main/cvit_deepfake_detection_ep_50.pth
python cvit_prediction.py --p <video path> --f <number_of_frames> --w <weights_path> --n <network_type> --fp16 <half_precision>
python cvit_prediction.py --p <video path> --f <number_of_frames> --d <dataset_type> --w <weights_path> --n <network_type> --fp16 <half_precision>
E.g usage:
python cvit_prediction.py --p sample__prediction_data --f 15 --n cvit2 --fp16 y
predict DFDC:
python cvit_prediction.py --p dfdc_vidoes --d dfdc --f 15 --n cvit2 --fp16 y
Predicts whether a video is Deepfake or not.
Prediction value <0.5 - REAL
Prediction value >=5 - FAKE
--p (str): Path to the video or image file for prediction.
Example: --p /path/to/video.mp4
--f (int): Number of frames to process for prediction.
Example: --f 30
--d (str): Dataset type. Options are dfdc, faceforensics, timit, or celeb.
Example: --d dfdc
--w (str): Path to the model weights for CViT or CViT2.
Example: --w cvit2_deepfake_detection_ep_50
--n (str): Network type. Options are cvit or cvit2.
Example: --n cvit
--fp16 (str): Enable half-precision support. Accepts a boolean value (true or false).
Example: --fp16 true
To train the model on your own you can use the following parameters:
python train_cvit.py -e <epochs> --d <data_path> --b <batch_size> --l <learning_rate> --w <weight_decay> --t <test_option>
-e, --epoch (int): Number of training epochs, defualt=1.
-d, --dir (str): Path to the training data.
-b, --batch (int): Batch size, defualt=32.
-l, --rate (float): Learning rate, default=0.001.
-w, --wdecay (float): Weight decay, default= 0.0000001.
-t, --test (str): Test on test set (e.g., y).
Deressa Wodajo
Solomon Atnafu
Peter Lambert
Glenn Van Wallendael
Hannes Mareen
@misc{wodajo2021deepfake,
title={Deepfake Video Detection Using Convolutional Vision Transformer},
author={Deressa Wodajo and Solomon Atnafu},
year={2021},
eprint={2102.11126},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@inproceedings{wodajo2024deepfake,
title={Improved Deepfake Video Detection Using Convolutional Vision Transformer},
author={Deressa Wodajo, Peter Lambert, Glenn Van Wallendael, Solomon Atnafu and Hannes Mareen},
booktitle={Proceedings of the IEEE International Conference on Games, Entertainment & Media (GEM)},
year={2024},
month={June},
address={Turin (Torino), Italy}
}