Awesome-GenAI-Watermarking

This repo include papers about the watermarking methods for generative AI models. Watermarking is a method for embedding an imperceptible, but recoverable signal (payload) into a digital asset (cover). With generative models, there are approaches which train the model to produce the watermark in every output and this behaviour should be hard to disable. We refer to this as "Fingerprint Rooting" or just "Rooting".

1. Introduction
2. Image Domain
3. Audio Domain
4. Text Domain
5. Related News
6. Generative Model stealing Papers
7. Survey Papers
7.1 A Systematic Review on Model Watermarking for Neural Networks
8 Further Links

1. Introduction

1.1 Watermarking Goals

Deep fake detection (Is a digit asset AI-generated?)
- Counter misinformation
- Prevent data crawlers from picking up generated content and cause "Self-Consuming Generative Models Go MAD"
Deep fake attribution (By whom (which user of A model API) has it been generated?)
Enhanced Model Fingerprinting (By which model has it been generated?)
IP protection
- Protect valuable models
- Protect valuable training data (e.g. style)
Tamper Localization (Where has an asset been doctored?)
- see "EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection"

1.2 Differences Between Watermarking Schemes

Fingerprint Rooting vs. Post-Hoc Watermarking
- Fingerprint Rooting
  - The model is manipulated in a way that its usual generating process produces watermarked output
  - This DOES protect open models
- Post-Hoc Watermarking
  - The Watermark is added after the fact, in a separate process
  - This does NOT protect open models, as the watermark embedding procedure can be simply disabled
Static vs. Dynamic Watermarking
- Static watermarking
  - "[...] specific pattern in its static content, such as a particular distribution of parameters" (see "Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process")
- Dynamic Watermarking
  - "[...] specific pattern in model’s dynamic contents, such as its behavior.", e.g. trigger-prompt-watermark-backdoors
    - Requires at least API-access to the model, as the watermark will only be present upon specific trigger
    - Examples for introducing backdoors in general (not for watermarking specifically) into diffusion models:
      - TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
      - How to Backdoor Diffusion Models?
Tuning mechanism
- Via Training data
  - e.g. Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data) -> "Plug-and-play", as it works on all architectures due to transferability
- Via joint fine-tuning of a model and a decoder (taken from a encoder/decoder-pair) on few samples
  - e.g. The Stable Signature: Rooting Watermarks in Latent Diffusion Models (Requires latent generative model)
Flexibility: How quick can a model be rooted:
- Full training for each instance
  - e.g. Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data
- Fine-tuning for each instance
  - e.g. The Stable Signature: Rooting Watermarks in Latent Diffusion Models
- Fine-tuning once
  - e.g. using message matrix as in Flexible and Secure Watermarking for Latent Diffusion Model

1.3 What Information is Transported by the Watermark?

Generated asset yes/no
Identity of watermarking party
Identifier of the asset in provenance database (can replace perceptual hashing, mentioned in "RoSteALS: Robust Steganography using Autoencoder Latent Space")

1.4 Attacks on Watermarking

Watermark removal
- Removing a watermark from a given digital asset
- Attacker goals
  - A fake asset can be claimed to be real
    - Disable IP claims
    - Misinformation
    - Sensor spoofing (weird, but it was mentioned in Responsible Disclosure of Generative Models Using Scalable Fingerprinting)
- Robustness property
  - Removing the watermark should decrease the asset quality. This negates the usefulness of the asset for malicious goals
Watermark forgery (referred to as spoofing by Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks)
- Adding a watermark to a given digital asset
- Attacker goals
  - False IP claims
  - A real asset can be denounced as fake (Misinformation)
  - Reputation loss after linking obscene content with model (mentioned in Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks)
Model purification
- A watermarked model which should only produce watermarked output, even if distributed to untrusted parties (i.e. Stable Signature), is "purified" in a way that removes the watermarks in its output.
- Attacker goals
  - Obtain a model which does not produced watermarked content
- Robustness property
  - Removing the watermark functionality of the model should decrease the output quality. This negates the usefulness of the asset for malicious goals

Threat models

Whitebox
- Attacker has full access to a generative AI model
... TODO

Difference between Watermarking and Cryptography

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
Watermarking is not Cryptography	IWDW	2006	-	Author webpage	- TODO

2. Image Domain

2.1 Papers on Watermarking Diffusion Models (outputs) (Image)

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data	ICCV	2021	-	Arxiv	- Rooting GAN models. By embedding watermark into training data to exploit transferability
PTW: Pivotal Tuning Watermarking for Pre-Trained Image Generators	USENIX	2023	Github	Arxiv	- Focus on GANs, but latent diffusion models should work too
The Stable Signature: Rooting Watermarks in Latent Diffusion Models	ICCV	2023	Github	Arxiv	- Meta/FAIR author Finetune a model in accordance with encoder/decoder to reveal a secret message in its output. - robust to watermark removal and model purification (quality deterioration) - Static watermarking
Flexible and Secure Watermarking for Latent Diffusion Model	ACM MM	2023	-	-	- References Stable Signature and improves by adding flexibility by allowing for embedding different messages w.o. finetuning
A Training-Free Plug-and-Play Watermark Framework for Stable Diffusion	-	2024	-	Arxiv	- TODO
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models	NeurIPS Workshop on Diffusion Models	2023	-	Arxiv	- TODO
RoSteALS: Robust Steganography using Autoencoder Latent Space	CVPR Workshops (CVPRW)	2023	Github	Arxiv	- Post-hoc watermarking
DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models	NeurIPS Workshop on Diffusion Models	2023	-	Arxiv	- Not about Rooting -Data Poisoning protected images which will reproduce if used as training data in diffusion model
A Recipe for Watermarking Diffusion Models	-	2023	Github	Arxiv	- Framework for 1. small unconditional/class-conditional DMs via training from scratch on watermarked data and 2. text-to-image DMs via finetuning a backdoor-trigger-output - Lots of references on watermarking discriminative models - Static watermarking
Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process	-	2023	-	Arxiv	- Threat model: Check ownership of model by having access to the model - Hard to read - Explains difference between static and dynamic watermarking with many references
Securing Deep Generative Models with Universal Adversarial Signature	-	2023	Github	Arxiv	- 1. Find optimal signature for an image individually. - 2. Finetune a GenAI model on these images.
Watermarking Diffusion Model	-	2023	-	Arxiv	- Finetuning a backdoor-trigger-output - Static watermarking - CISPA authors
Catch You Everything Everywhere: Guarding Textual Inversion via Concept Watermarking	-	2023	-	Arxiv	- Guards concepts obtained through textual inversion (An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion) from abuse by allowing to identify concepts in generated images. - Very interesting references on company and government stances on watermarking
Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis	-	2023	-	Arxiv	- Different from Glaze in that style synthesis from protected source images is not prevented, but recognizable via watermarks - CISPA authors
Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content	-	2024	-	OpenReview	- Watermark removal and forgery in one method, using GAN - References two types of watermarking: 1. Learn/finetune model to produce watermarked output and 2. post-hoc watermarking after the fact (static vs. dynamic, see "Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process")
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks	ICLR	2024	Github	Arxiv	- They show that low budget watermarking methods are beaten by diffusion purification and propose an attack that can even remove high budget watermarks by model substitution
A Transfer Attack to Image Watermarks	-	2024	-	Arxiv	- Watermark removal by "no-box"-attack on detectors (no access to detector-API, instead training classifier to distinguish watermarked and vanilla images)
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection	CVPR	2024	Github	Arxiv	- Post-hoc watermarking with tamper localization
Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space	-	2024	-	Arxiv	- Discusses 3 categories for watermarks with references: before, during, and after generation
Stable Messenger: Steganography for Message-Concealed Image Generation	-	2023	-	Arxiv	- Post-hoc watermarking - Watermark embedding during generation according to "Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space", but I think it is actually post-hoc.

2.2 Watermarks to Guide Other Objectives

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
StegaStamp: Invisible Hyperlinks in Physical Photographs	CVPR	2020	Github	Arxiv	- Watermark in physical images that can be captured from video stream - "Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content" speculates that Deepmind SynthID works similarly to this
ChartStamp: Robust Chart Embedding for Real-World Applications	ACM MM	2022	Github	-	- Like StegaStamp, but it introduces less clutter in flat regions in images
Unadversarial Examples: Designing Objects for Robust Vision	NeurIPS	2021	Github	Arxiv	- Perturbations to make detection easier

2.3 Misc Papers (to be categorized...)

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
ProMark: Proactive Diffusion Watermarking for Causal Attribution	CVPR	2024	-	Arxiv	- TODO
Watermarking Images in Self-Supervised Latent Spaces	ICASSP	2022	Github	Arxiv	- TODO
Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats	ICML Workshop DeployableGenerativeAI	2023	-	-	- Attack on pixel-watermarks using LDM autoencoders
Invisible Image Watermarks Are Provably Removable Using Generative AI	-	2023	Github	Arxiv	- Is not about rooting a model, but removing watermarks with diffusion purification - Evaluates stable signature and Tree-Ring Watermarks. Tree-ring is robust against their attack. - Earlier Version of Generative Autoencoders as Watermark Attackers
WaterDiff: Perceptual Image Watermarks Via Diffusion Model	IVMSP-P2 Workshop at ICASSP	2024	-	-	- TODO
Squint Hard Enough: Attacking Perceptual Hashing with Adversarial Machine Learning	USENIX	2022	-	-	- Attacks on perceptual hashes
Evading Watermark based Detection of AI-Generated Content	CCS	2023	Github	Arxiv	- Evaluation of robustness of image watermarks + Adversarial sample for evasion
Diffusion Models for Adversarial Purification	ICML	2022	Github	Arxiv	- Defense against adversarial pertubation, including imperceptible watermarks in images
Flow-Based Robust Watermarking with Invertible Noise Layer for Black-Box Distortions	AIII	2023	Github	-	- Like HiDDeN, just a neural watermark encoder/extractor
HiDDeN: Hiding Data With Deep Networks	ECCV	2018	Github	Arxiv	- Main tool used in Stable Signature - Contains differentiable approx. of JPEG compression - Dynamic watermarking
Glaze: Protecting artists from style mimicry by text-to-image models	USENIX	2023	Github	Arxiv	- Is not about Rooting, but denying style stealing
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization	-	2023	-	Arxiv	- Seem similar to Glaze on first glance. Authors may have been unlucky to do parallel work
Responsible Disclosure of Generative Models Using Scalable Fingerprinting	ICLR	2022	Github	Arxiv	- Rooting GAN models. Seems to have introduced the idea of scalably producing many models fast with large message space (TODO: check this later), similar to how Stable Signature did it later for stable diffusion.
On Attribution of Deepfakes	-	2020	-	Arxiv	- They show that an image can be created that looks like it may have been generated by a targeted model. They also propose a framework how to achieve deniability for such cases.
Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms	ACM MM	2022	Github	Arxiv	- Is not about rooting a model, but about attacking post-hoc watermarking of images - Lots of references on invertible NNs
DocDiff: Document Enhancement via Residual Diffusion Models	ACM MM	2023	Github	Arxiv	- Is not about rooting a model, but about post-hoc watermarking of images - Includes classic watermark removal
Warfare:Breaking the Watermark Protection of AI-Generated Content	-	2023	Did not look for it yet	Arxiv	- Is not about rooting a model, but about attacking post-hoc watermarking - Includes 1. watermark removal and 2. forging
Leveraging Optimization for Adaptive Attacks on Image Watermarks	ICML (Poster)	2024	Did not look for it yet	Arxiv	- Is not about rooting a model, but about attacking post-hoc watermarking
A Somewhat Robust Image Watermark against Diffusion-based Editing Models	-	2023	Did not look for it yet	Arxiv	- Is not about rooting a model, but about post-hoc watermarking of images - Takes watermarks literally and injects hidden images
Hey That's Mine Imperceptible Watermarks are Preserved in Diffusion Generated Outputs	-	2023	-	Arxiv	- Is not about rooting a model. They show that watermarks in training data are recognizable in output and allow for intellectual property claims
Benchmarking the Robustness of Image Watermarks	-	2024	Github	Arxiv	- Just a benchmark/framework for testing watermarks against
Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks	ACM MM	2023	Did not look for it yet	Arxiv	- Is not about generative models, but discriminative models
Adversarial Attack for Robust Watermark Protection Against Inpainting-based and Blind Watermark Removers	ACM MM	2023	Did not look for it yet	-	- Post-hoc watermark with enhanced robustness against inpainting
A Novel Deep Video Watermarking Framework with Enhanced Robustness to H.264/AVC Compression	ACM MM	2023	Github	-	- Post-hoc watermark for videos
Practical Deep Dispersed Watermarking with Synchronization and Fusion	ACM MM	2023	Did not look for it yet	Arxiv	- Post-hoc watermark for images with enhanced robustness to transformations
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning	-	2023	Github	Arxiv	- Is not about rooting, but GenAI image detection
Enhancing the Robustness of Deep Learning Based Fingerprinting to Improve Deepfake Attribution	ACM MM-Asia	2022	-	-	- Is not about rooting, but transformation-robustness strategies for watermarks
You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership	NeurIPS	2021	Github	Arxiv	- Watermarking the sparsity mask of winning lottery tickets
Self-Consuming Generative Models Go MAD	ICLR (Poster)	2024	-	Arxiv	- Contains a reason why GenAI detection is important: Removing generated content from training sets

3. Audio Domain

3.1 Papers on Watermarking (Audio)

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
Proactive Detection of Voice Cloning with Localized Watermarking	-	2024	Github	Arxiv	- Meta/FAIR author
MaskMark: Robust Neural Watermarking for Real and Synthetic Speech	ICASSP	2024	Audio samples	IEEExplore	-
Collaborative Watermarking for Adversarial Speech Synthesis	ICASSP	2024	-	Arxiv	- Meta/FAIR author
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis	NeurIPS	2020	Github	Arxiv	- Very good GAN for Speech synthesis (TODO: Is this SotA?) - Can do live synthesis even on CPU - Quality is on par with autoregressive models
Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders	ICASSP	2023	-	Arxiv	- Include vocoder generated training data to enhance detection capabilities for countermeasures
AudioQR: Deep Neural Audio Watermarks For QR Code	IJCAI	2023	Github	-	- Imperceptible QR-codes in audio for the visually impaired

3.2 Audio Synthesis Datasets

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
ASVspoof 2021 Challenge	-	2021	Github	Arxiv	- Challenge for audio spoofing detection
ADD 2022: the first Audio Deep Synthesis Detection Challenge	ICASSP	2022	Github	Arxiv	- Official Chinese challenge website (NO HTTPS!)

For more (but older dataset see Awesome-DeepFake-Learning)

3.3 News on Audio Watermarking

Microsoft: Introducing the watermark algorithm for synthetic voice identification

3.4 Further Links on Audio Synthesis and Detection

Github topics: Audio Synthesis
Github topic: Audio Deepfake Detection
Awesome Deepfakes Detection
Awesome-DeepFake-Learning
Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion (is supposedly also very fast, like HIFI-GAN)
Meta AI: VoiceBox
- SotA Speech model with many functionalities (noise removal (barking), style transfer, ...)
TTS libraries (including speech synthesis)
- Coqui-AI library
- Mimic3
- Amphion-AI
- Tortoise TTS
  - pet project by some student
  - includes tools for detection

4. Text Domain

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models	-	2023	Github	Arxiv	-
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding	S&P	2021	Github	Arxiv	-
Resilient Watermarking for LLM-Generated Codes	-	2024	Github Appendix	Arxiv	- Code
Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code	-	2024	-	Arxiv	- Error correction
Provable Robust Watermarking for AI-Generated Text	ICLR	2024	Github	Arxiv	- Apparently good and robust LLM Watermarking
Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs	ICLR	2024	Github	Arxiv	- TODO

5. Related News

Coalition for Content Provenance and Authenticity (C2PA)
- Is based on a trust model with signing authorities which certify signer of some digital asset through a chain of trust, similar to internet PKI.
- C2PA Specifications
  - Explainer
    - "Provenance generally refers to the facts about the history of a piece of digital content assets (image, video, audio recording, document). C2PA enables the authors of provenance data to securely bind statements of provenance data to instances of content using their unique credentials. These provenance statements are called assertions by the C2PA. They may include assertions about who created the content and how, when, and where it was created. They may also include assertions about when and how it was edited throughout its life. The content author, and publisher (if authoring provenance data) always has control over whether to include provenance data as well as what assertions are included, such as whether to include identifying information (in order to allow for anonymous or pseudonymous assets). Included assertions can be removed in later edits without invalidating or removing all of the included provenance data in a process called redaction."
    - "In the C2PA Specifications, trust decisions are made by the consumer of the asset based on the identity of the actor(s) who signed the provenance data along with the information in the assertions contained in the provenance. This signing takes place at each significant moment in an asset’s life (e.g., creation, editing, etc.) through the use of the actor’s unique credentials and ensures that the provenance data remains cryptographically bound to the newly created or updated asset."
    - "Soft bindings are described using soft binding assertions such as via a perceptual hash computed from the digital content or a watermark embedded within the digital content. These soft bindings enable digital content to be matched even if the underlying bits differ, for example due to an asset rendition in a different resolution or encoding format. Additionally, should a C2PA manifest be removed from an asset, but a copy of that manifest remains in a provenance store elsewhere, the manifest and asset may be matched using available soft bindings."
      - This includes ISCC - Content Codes
    - -> allows authors and subsequent actors to sign assets and make this act publicly known to establish the asset's history.
Open-source tools for content authenticity and provenance
- Uses manifests defined in C2PA Specifications
- Enables camera manufacturers to insert authenticity meta-data on-device at time of capture
Deepmind SynthID
- Images (Vertex AI imagen)
- Audio (Google DeepMind's Lyria)
  - Transfer artist's styles to audio prompts (style transfer)
  - Most likely, the watermark will include the prompt used to generate audio (e.g. Artist name). This allows for copyright claims.
- In Beta (as of 26 Mar 2024)
- Closed off as can be (as of 26 Mar 2024)
Google hosted a workshop in June 2023 (Identifying and Mitigating the Security Risks of Generative AI): "Watermarking was mentioned as a promising mitigation. They are robust when attacker has no access to detection algorithm"
Watermarking is identified as tool for establishing trust in a post GenAI environment by big tech (OpenAI moving AI governance forward statement, Google "Our commitment to advancing bold and responsible AI, together") and government (Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI )
Watermarks can already be easily inserted into Stable Diffusion models (This method invisible-watermark repo is referenced by the official Stable Diffusion repo). This is based on Digital Watermarking and Steganography" (DwtDct and DwtDctSvd). Also see watermark option in the Stable Diffusion repo https://github.com/CompVis/stable-diffusion/blob/main/scripts/txt2img.py#L69.
Stable Diffusion XL recommends using invisible watermarks pip package. The supported algorithms are Dwt, Dct, and RivaGAN .
China bans GenAI without Watermarks

6. Generative Model stealing Papers

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
Stealing Machine Learning Models: Attacks and Countermeasures for Generative Adversarial Networks	ACSAC	2021	-	Arxiv	-
Model Extraction Attack and Defense on Deep Generative Models	Journal of Physics	2022	-	-	-
Model Extraction and Defenses on Generative Adversarial Networks	-	2021	-	Arxiv	-

7. Survey Papers

Paper	Proceedings / Journal	Venue Year / Last Updated	Code	Alternative PDF Source	Notes
A Comprehensive Survey on Robust Image Watermarking	Neurocomputing	2022	-	Arxiv	- Not about model rooting
A Systematic Review on Model Watermarking for Neural Networks	Frontiers in Big Data	2021	-	Arxiv	- Not about model rooting
A Comprehensive Review on Digital Image Watermarking	-	2022	-	Arxiv	- Not about model rooting
Copyright Protection in Generative AI: A Technical Perspective	-	2024	-	Arxiv	- About IP protection in GenAI in general
Security and Privacy on Generative Data in AIGC: A Survey	-	2023	-	Arxiv	- About security aspects in GenAI in general
Detecting Multimedia Generated by Large AI Models: A Survey	-	2024	-	Arxiv	- About detecting GenAI in general
Audio Deepfake Detection: A Survey	-	2023	-	Arxiv	- Contains overview of spoofed audio datasets, spoofing methods, and detection methods - Very good servey

7.1 A Systematic Review on Model Watermarking for Neural Networks

Summarization of the systematization given in this review.

Taxonomy

Embedding method
- Watermark in model parameters
- Trigger-Watermark-Backdoor
Verification access
- Whitebox (access model parameters)
- Blackbox (access via API)
Capacity
- Zero-bit (is watermark exists)
- Multi-Bit (watermark contains arbitrary info)
Authentication
- Model is watermarked
- By whom model is watermarked
Uniqueness
- All model instances carry same watermark
- Different model instances carry different watermarks

Requirements & Security Goals

Goal	Explaination	Motivation
Fidelity	High prediction quality on original tasks	model performance shouldn't significantly degrade
Robustness	Watermark should resist removal	protects against copyright evasion
Reliability	Minimal false negatives	ensures rightful ownership is recognized
Integrity	Minimal false positives	prevents wrongful accusations of theft
Capacity	Supports large information amounts	allows comprehensive watermarks
Secrecy	Watermark must be secret and undetectable	prevents unauthorized detection
Efficiency	Fast watermark insertion and verification	avoids computational burden
Generality	Independent of datasets and ML algorithms	facilitates widespread application

Threat Model

Attacker Knowledge:
1. existence of the watermark
2. model and its parameters
3. watermarking scheme used
4. (parts of) the training data
5. (parts of) the watermark itself or the trigger dataset
Attacker Capabilities (irrelevant)
- passive (eavesdropping)
- active (interaction)
Attacker Objectives
- For what is model being used by the attacker? (rather unspecific)

Attacks against Watermarking

Watermark Detection (weakest)
Watermark Suppression, i.e. avoid watermark verification
- e.g. dissimulating any presence of a watermark in the model parameters and behavior
- e.g. suppressing the reactions of the model to the original watermark trigger
Watermark Forging
1. Recovering the legitimate owner’s watermark and claiming ownership (if there is no binding between the watermark and the owner)
2. Adding a new watermark that creates ambiguity concerning ownership
3. Identifying a fake watermark within the model that coincidentally acts like a real watermark but actually is not
Watermark Overwriting
1. Adding Watermark to model with deactivating old one (strong)
2. Adding Watermark to model without deactivating old one (weak)
Watermark Removal
1. depends on the presence of a watermark
2. depends on the underlying watermarking scheme
3. depends on availability of additional data, e.g. for fine-tuning or retraining
  - Methods
    - Fine-Tuning
    - Pruning
    - Quantization
    - Distillation
    - Transfer-Learning
    - Backdoor Removal

Categorizing Watermarking Methods

Embedding Watermarks into Model Parameters
- Adding patterns into model which can be verified locally
Using Pre-Defined Inputs as Triggers
- Adding behaviour triggered by special input
Using Model Fingerprints to Identify Potentially Stolen Instances
- No additional action needed, just recognizing a model based on some criteria

8 Further Links

Awesome github with resources on neural IP protection https://github.com/ZJZAC/awesome-deep-model-IP-protection

and-mill/Awesome-GenAI-Watermarking