Watermarking as Defense against Model Extraction Attacks on Deep Learning Models

This repository contains black-box watermarking defense mechanisms for model extraction attacks.

Mainly contains these methods implementation -:

DAWN Dynamic Adversarial Watermarking of Neural Networks
Entangled watermarking
Frontier Stitching
Extended Frontier Stitching - extended version of Frontier Stitching (proposed method of this thesis to handle model extraction via api access by extending frontier stitching idea)

Deatiled results are also presented in thesis document as well as the meeting presentations in extra folder

Installation

Clone the repository
Setup the virtual environment. This repo is mainly based on Python 3.8.10 (but tested on Python 3.10.x also - worked completely fine.)
Already provided the requirements.txt file to setup the common requirements used throughout this code base. (Incase some library version is not supported, it is easy to use with your Python compatible version.)
`Just install the requirements using pip install -r requirements.txt

dawndynamicadversarialwatermarkingofneuralnetworks/
- This folder main code is present in scripts and also in utils
- For this folder, make sure to run pip install -e . before running the scripts because it is nested differently for utils and scripts, so to make sure everything works out, you have to install this as a package itself as mentioned by the command. This is why we have init.py file in the directory.
- This is implemented in torch 2.0.0+cu117
entangled-watermark/
- This folder main code is present in folder Tensorflowv2
- Originally this paper code is present on thier official github repository but unfortunately it was in TensorFlow 1.3.x which was not very compatibel for the attacks we were performing.
- Hence have to modify the code to make it to TensorFlow 2.10.x. We also tried to convert this code to PyTorch (which can be refered as the future work.)
frontier-stitching-and-extended-frontier-stitching/
- This folder main code conatins in folder: code.
- This is implemented in TensiorFlow 2.10.x

Each of this sub folders contains the configurations files, since we have set the varaibles which user has to pass through configuration file.
For each of this can be found in different subfolders may be (this can be considered as future work! - to have the same directory structure for each of these defense techniques)
Though the details of configuration files is already mentioned in the respective ReadMe.md file
To perform the attack - we have used art-toolbox
In this work, we focused on KnockoffNets attack which is one of the strongest attacks on deep learning models.
For the attack we have made assumption that attacker has knowledge of the dataset for which the victim model is trained with. Reason: This is the strong assumption for the defense, even if the attacker has full knowledge of the training data but not of the victim model, we can claim the ownership if it is stolen.
Each of the above mentioned directories contains the filename derived with real_model_stealing.py which contains the code to perform the attack on the watermarked model (watermarked victim model) and also during attack perfroming the verification on the watermark set.

View Demo : Video

To run the code or app locally run streamlit run app.py from that folder

Incase of any issues on this repository please write at : garg.ridhima72@gmail.com