CViT

Deepfake Video Detection Using Convolutional Vision Transformer

Implementation code for our paper. link to paper | link to MS Thesis | link to MS Thesis defense PPT file | link to CViT2

Update, April 1, 2024

CViT2

Improved Deepfake Video Detection Using Convolutional Vision Transformer

Requirements:

Pytorch >=1.4

DL library used for face extraction

helpers_read_video_1.py
helpers_face_extract_1.py
blazeface.py
blazeface.pth
face_recognition
facenet-pytorch
dlib

Preprocessing

extractfaces.py Face extraction from video. The code works for DFDC dataset. You can test it using the sample data provided.

Weights

cvit_deepfake_detection_ep_50.pth - Model weight for CViT.
cvit2_deepfake_detection_ep_50.pth - Model weight for CViT2.

Predict CViT

Download the pretrained model from Huggingface and save it in the weight folder.

CViT2 - trained on 5 datasets including DFDC

wget https://huggingface.co/datasets/Deressa/cvit/blob/main/cvit2_deepfake_detection_ep_50.pth

CViT - trained on DFDC

wget https://huggingface.co/datasets/Deressa/cvit/blob/main/cvit_deepfake_detection_ep_50.pth

python cvit_prediction.py --p <video path> --f <number_of_frames> --w <weights_path> --n <network_type> --fp16 <half_precision>

To predict on some deepfake datasets:

python cvit_prediction.py --p <video path> --f <number_of_frames> --d <dataset_type> --w <weights_path> --n <network_type> --fp16 <half_precision>

E.g usage:

python cvit_prediction.py --p sample__prediction_data --f 15 --n cvit2 --fp16 y

predict DFDC: python cvit_prediction.py --p dfdc_vidoes --d dfdc --f 15 --n cvit2 --fp16 y

Arguments

Predicts whether a video is Deepfake or not.
Prediction value <0.5 - REAL
Prediction value >=5 - FAKE

--p (str): Path to the video or image file for prediction.

Example: --p /path/to/video.mp4

--f (int): Number of frames to process for prediction.

Example: --f 30

--d (str): Dataset type. Options are dfdc, faceforensics, timit, or celeb.

Example: --d dfdc

--w (str): Path to the model weights for CViT or CViT2.

Example: --w cvit2_deepfake_detection_ep_50

--n (str): Network type. Options are cvit or cvit2.

Example: --n cvit

--fp16 (str): Enable half-precision support. Accepts a boolean value (true or false).

Example: --fp16 true

Train CViT

To train the model on your own you can use the following parameters:

python train_cvit.py -e <epochs> --d <data_path> --b <batch_size> --l <learning_rate> --w <weight_decay> --t <test_option>

Options

-e, --epoch (int): Number of training epochs, defualt=1.

-d, --dir (str): Path to the training data.

-b, --batch (int): Batch size, defualt=32.

-l, --rate (float): Learning rate, default=0.001.

-w, --wdecay (float): Weight decay, default= 0.0000001.

-t, --test (str): Test on test set (e.g., y).

Authors

Deressa Wodajo
Solomon Atnafu
Peter Lambert
Glenn Van Wallendael
Hannes Mareen

Bibtex

CViT

@misc{wodajo2021deepfake,
      title={Deepfake Video Detection Using Convolutional Vision Transformer}, 
      author={Deressa Wodajo and Solomon Atnafu},
      year={2021},
      eprint={2102.11126},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

CViT2

@inproceedings{wodajo2024deepfake,
    title={Improved Deepfake Video Detection Using Convolutional Vision Transformer},
    author={Deressa Wodajo, Peter Lambert, Glenn Van Wallendael, Solomon Atnafu and Hannes Mareen},
    booktitle={Proceedings of the IEEE International Conference on Games, Entertainment & Media (GEM)},
    year={2024},
    month={June},
    address={Turin (Torino), Italy}
}

erprogs/CViT

CViT

Deepfake Video Detection Using Convolutional Vision Transformer

Update, April 1, 2024

CViT2

Improved Deepfake Video Detection Using Convolutional Vision Transformer

Requirements:

DL library used for face extraction

Preprocessing

Weights

Predict CViT

CViT2 - trained on 5 datasets including DFDC

CViT - trained on DFDC

To predict on some deepfake datasets:

Arguments

Train CViT

Options

Authors

Bibtex

CViT

CViT2