FLIP: A Python repository from koushiksrivats

FLIP: Cross-domain Face Anti-spoofing with Language Guidance

[ICCV 2023]

Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar
MBZUAI, UAE.

Updates 📢

28-09-2023: Code released.

Highlights 🌟

We show that direct finetuning of a multimodal pre-trained ViT (e.g., CLIP image encoder) achieves better FAS generalizability without any bells and whistles.
We propose a new approach for robust cross-domain FAS by grounding the visual representation using natural language semantics. This is realized by aligning the image representation with an ensemble of text prompts (describing the class) during finetuning.
We propose a multimodal contrastive learning strategy, which enforces the model to learn more generalized features that bridge the FAS domain gap even with limited training data. This strategy leverages view-based image self-supervision and view-based cross-modal image-text similarity as additional constraints during the learning process.
Extensive experiments on three standard protocols demonstrate that our method significantly outperforms the state- of-the-art methods, achieving better zero-shot transfer performance than five-shot transfer of “adaptive ViTs”.

Instruction for code usage 📄

Setup

Get Code

 git clone https://github.com/koushiksrivats/FLIP.git

Build Environment

cd FLIP
conda env create -f environment.yml
conda activate fas

Dataset Pre-Processing

Please refer to datasets.md for acquiring and pre-processing the datasets.

Training and Inference

Please refer to run.md for training and evaluating the models.

Model Zoo

Please refer to model_zoo.md for the pre-trained models.

Results 📈

Cross Domain performance in Protocol 1

Cross Domain performance in Protocol 2

Cross Domain performance in Protocol 3

Visualizations 🎨

Attention Maps on the spoof samples in MCIO datasets: Attention highlights are on the spoof-specific clues such as paper texture (M), edges of the paper (C), and moire patterns (I and O).

Attention Maps on the spoof samples in WCS datasets: Attention highlights are on the spoof-specific clues such as screen edges/screen reflection (W), wrinkles in printed cloth (C), and cut-out eyes/nose (S).

Citation

If you're using this work in your research or applications, please cite using this BibTeX:

  @InProceedings{Srivatsan_2023_ICCV,
    author    = {Srivatsan, Koushik and Naseer, Muzammal and Nandakumar, Karthik},
    title     = {FLIP: Cross-domain Face Anti-spoofing with Language Guidance},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {19685-19696}
}

Acknowledgement 🙏

Our code is built on top of the few_shot_fas repository. We thank the authors for releasing their code.