/Awesome-GenAI-Watermarking

A curated list of watermarking schemes for generative AI models

MIT LicenseMIT

Awesome-GenAI-Watermarking

This repo include papers about the watermarking methods for generative AI models. Watermarking is a method for embedding an imperceptible, but recoverable signal (payload) into a digital asset (cover). With generative models, there are approaches which train the model to produce the watermark in every output and this behaviour should be hard to disable. We refer to this as "Fingerprint Rooting" or just "Rooting".

1. Introduction


1.1 Watermarking Goals

  • Deep fake detection (Is a digit asset AI-generated?)
  • Deep fake attribution (By whom (which user of A model API) has it been generated?)
  • Enhanced Model Fingerprinting (By which model has it been generated?)
  • IP protection
    • Protect valuable models
    • Protect valuable training data (e.g. style)
  • Tamper Localization (Where has an asset been doctored?)
    • see "EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection"

1.2 Differences Between Watermarking Schemes

1.3 What Information is Transported by the Watermark?

  • Generated asset yes/no
  • Identity of watermarking party
  • Identifier of the asset in provenance database (can replace perceptual hashing, mentioned in "RoSteALS: Robust Steganography using Autoencoder Latent Space")

1.4 Attacks on Watermarking

  • Watermark removal
    • Removing a watermark from a given digital asset
    • Attacker goals
    • Robustness property
      • Removing the watermark should decrease the asset quality. This negates the usefulness of the asset for malicious goals
  • Watermark forgery (referred to as spoofing by Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks)
  • Model purification
    • A watermarked model which should only produce watermarked output, even if distributed to untrusted parties (i.e. Stable Signature), is "purified" in a way that removes the watermarks in its output.
    • Attacker goals
      • Obtain a model which does not produced watermarked content
    • Robustness property
      • Removing the watermark functionality of the model should decrease the output quality. This negates the usefulness of the asset for malicious goals

Threat models

  • Whitebox
    • Attacker has full access to a generative AI model
  • ... TODO

Difference between Watermarking and Cryptography

Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
Watermarking is not Cryptography IWDW 2006 - Author webpage - TODO

2. Image Domain


2.1 Papers on Watermarking Diffusion Models (outputs) (Image)

Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data ICCV 2021 - Arxiv - Rooting GAN models. By embedding watermark into training data to exploit
transferability
PTW: Pivotal Tuning Watermarking for Pre-Trained Image Generators USENIX 2023 Github Arxiv - Focus on GANs, but latent diffusion models should work too
The Stable Signature: Rooting Watermarks in Latent Diffusion Models ICCV 2023 Github Arxiv - Meta/FAIR author
Finetune a model in accordance with encoder/decoder to reveal a secret message in its output.
- robust to watermark removal and model purification (quality deterioration)
- Static watermarking
Flexible and Secure Watermarking for Latent Diffusion Model ACM MM 2023 - - - References Stable Signature and improves by adding flexibility by allowing for embedding different messages w.o. finetuning
A Training-Free Plug-and-Play Watermark Framework for Stable Diffusion - 2024 - Arxiv - TODO
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models NeurIPS Workshop on Diffusion Models 2023 - Arxiv - TODO
RoSteALS: Robust Steganography using Autoencoder Latent Space CVPR Workshops (CVPRW) 2023 Github Arxiv - Post-hoc watermarking
DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models NeurIPS Workshop on Diffusion Models 2023 - Arxiv - Not about Rooting
-Data Poisoning protected images which will reproduce if used as training data in diffusion model
A Recipe for Watermarking Diffusion Models - 2023 Github Arxiv - Framework for 1. small unconditional/class-conditional DMs via training from scratch on watermarked data and 2. text-to-image DMs via finetuning a backdoor-trigger-output
- Lots of references on watermarking discriminative models
- Static watermarking
Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process - 2023 - Arxiv - Threat model: Check ownership of model by having access to the model
- Hard to read
- Explains difference between static and dynamic watermarking with many references
Securing Deep Generative Models with Universal Adversarial Signature - 2023 Github Arxiv - 1. Find optimal signature for an image individually.
- 2. Finetune a GenAI model on these images.
Watermarking Diffusion Model - 2023 - Arxiv - Finetuning a backdoor-trigger-output
- Static watermarking
- CISPA authors
Catch You Everything Everywhere: Guarding Textual Inversion via Concept Watermarking - 2023 - Arxiv - Guards concepts obtained through textual inversion (An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion) from abuse by allowing to identify concepts in generated images.
- Very interesting references on company and government stances on watermarking
Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis - 2023 - Arxiv - Different from Glaze in that style synthesis from protected source images is not prevented, but recognizable via watermarks
- CISPA authors
Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content - 2024 - OpenReview - Watermark removal and forgery in one method, using GAN
- References two types of watermarking: 1. Learn/finetune model to produce watermarked output and 2. post-hoc watermarking after the fact (static vs. dynamic, see "Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process")
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks ICLR 2024 Github Arxiv - They show that low budget watermarking methods are beaten by diffusion purification and propose an attack that can even remove high budget watermarks by model substitution
A Transfer Attack to Image Watermarks - 2024 - Arxiv - Watermark removal by "no-box"-attack on detectors (no access to detector-API, instead training classifier to distinguish watermarked and vanilla images)
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection CVPR 2024 Github Arxiv - Post-hoc watermarking with tamper localization
Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space - 2024 - Arxiv - Discusses 3 categories for watermarks with references: before, during, and after generation
Stable Messenger: Steganography for Message-Concealed Image Generation - 2023 - Arxiv - Post-hoc watermarking
- Watermark embedding during generation according to "Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space", but I think it is actually post-hoc.

2.2 Watermarks to Guide Other Objectives

Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
StegaStamp: Invisible Hyperlinks in Physical Photographs CVPR 2020 Github Arxiv - Watermark in physical images that can be captured from video stream
- "Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content" speculates that Deepmind SynthID works similarly to this
ChartStamp: Robust Chart Embedding for Real-World Applications ACM MM 2022 Github - - Like StegaStamp, but it introduces less clutter in flat regions in images
Unadversarial Examples: Designing Objects for Robust Vision NeurIPS 2021 Github Arxiv - Perturbations to make detection easier

2.3 Misc Papers (to be categorized...)

Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
ProMark: Proactive Diffusion Watermarking for Causal Attribution CVPR 2024 - Arxiv - TODO
Watermarking Images in Self-Supervised Latent Spaces ICASSP 2022 Github Arxiv - TODO
Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats ICML Workshop DeployableGenerativeAI 2023 - - - Attack on pixel-watermarks using LDM autoencoders
Invisible Image Watermarks Are Provably Removable Using Generative AI - 2023 Github Arxiv - Is not about rooting a model, but removing watermarks with diffusion purification
- Evaluates stable signature and Tree-Ring Watermarks. Tree-ring is robust against their attack.
- Earlier Version of Generative Autoencoders as Watermark Attackers
WaterDiff: Perceptual Image Watermarks Via Diffusion Model IVMSP-P2 Workshop at ICASSP 2024 - - - TODO
Squint Hard Enough: Attacking Perceptual Hashing with Adversarial Machine Learning USENIX 2022 - - - Attacks on perceptual hashes
Evading Watermark based Detection of AI-Generated Content CCS 2023 Github Arxiv - Evaluation of robustness of image watermarks + Adversarial sample for evasion
Diffusion Models for Adversarial Purification ICML 2022 Github Arxiv - Defense against adversarial pertubation, including imperceptible watermarks in images
Flow-Based Robust Watermarking with Invertible Noise Layer for Black-Box Distortions AIII 2023 Github - - Like HiDDeN, just a neural watermark encoder/extractor
HiDDeN: Hiding Data With Deep Networks ECCV 2018 Github Arxiv - Main tool used in Stable Signature
- Contains differentiable approx. of JPEG compression
- Dynamic watermarking
Glaze: Protecting artists from style mimicry by text-to-image models USENIX 2023 Github Arxiv - Is not about Rooting, but denying style stealing
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization - 2023 - Arxiv - Seem similar to Glaze on first glance. Authors may have been unlucky to do parallel work
Responsible Disclosure of Generative Models Using Scalable Fingerprinting ICLR 2022 Github Arxiv - Rooting GAN models. Seems to have introduced the idea of scalably producing many models fast with large message space (TODO: check this later), similar to how Stable Signature did it later for stable diffusion.
On Attribution of Deepfakes - 2020 - Arxiv - They show that an image can be created that looks like it may have been generated by a targeted model. They also propose a framework how to achieve deniability for such cases.
Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms ACM MM 2022 Github Arxiv - Is not about rooting a model, but about attacking post-hoc watermarking of images
- Lots of references on invertible NNs
DocDiff: Document Enhancement via Residual Diffusion Models ACM MM 2023 Github Arxiv - Is not about rooting a model, but about post-hoc watermarking of images
- Includes classic watermark removal
Warfare:Breaking the Watermark Protection of AI-Generated Content - 2023 Did not look for it yet Arxiv - Is not about rooting a model, but about attacking post-hoc watermarking
- Includes 1. watermark removal and 2. forging
Leveraging Optimization for Adaptive Attacks on Image Watermarks ICML (Poster) 2024 Did not look for it yet Arxiv - Is not about rooting a model, but about attacking post-hoc watermarking
A Somewhat Robust Image Watermark against Diffusion-based Editing Models - 2023 Did not look for it yet Arxiv - Is not about rooting a model, but about post-hoc watermarking of images
- Takes watermarks literally and injects hidden images
Hey That's Mine Imperceptible Watermarks are Preserved in Diffusion Generated Outputs - 2023 - Arxiv - Is not about rooting a model. They show that watermarks in training data are recognizable in output and allow for intellectual property claims
Benchmarking the Robustness of Image Watermarks - 2024 Github Arxiv - Just a benchmark/framework for testing watermarks against
Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks ACM MM 2023 Did not look for it yet Arxiv - Is not about generative models, but discriminative models
Adversarial Attack for Robust Watermark Protection Against Inpainting-based and Blind Watermark Removers ACM MM 2023 Did not look for it yet - - Post-hoc watermark with enhanced robustness against inpainting
A Novel Deep Video Watermarking Framework with Enhanced Robustness to H.264/AVC Compression ACM MM 2023 Github - - Post-hoc watermark for videos
Practical Deep Dispersed Watermarking with Synchronization and Fusion ACM MM 2023 Did not look for it yet Arxiv - Post-hoc watermark for images with enhanced robustness to transformations
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning - 2023 Github Arxiv - Is not about rooting, but GenAI image detection
Enhancing the Robustness of Deep Learning Based Fingerprinting to Improve Deepfake Attribution ACM MM-Asia 2022 - - - Is not about rooting, but transformation-robustness strategies for watermarks
You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership NeurIPS 2021 Github Arxiv - Watermarking the sparsity mask of winning lottery tickets
Self-Consuming Generative Models Go MAD ICLR (Poster) 2024 - Arxiv - Contains a reason why GenAI detection is important: Removing generated content from training sets

3. Audio Domain


3.1 Papers on Watermarking (Audio)

Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
Proactive Detection of Voice Cloning with Localized Watermarking - 2024 Github Arxiv - Meta/FAIR author
MaskMark: Robust Neural Watermarking for Real and Synthetic Speech ICASSP 2024 Audio samples IEEExplore -
Collaborative Watermarking for Adversarial Speech Synthesis ICASSP 2024 - Arxiv - Meta/FAIR author
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis NeurIPS 2020 Github Arxiv - Very good GAN for Speech synthesis (TODO: Is this SotA?)
- Can do live synthesis even on CPU
- Quality is on par with autoregressive models
Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders ICASSP 2023 - Arxiv - Include vocoder generated training data to enhance detection capabilities for countermeasures
AudioQR: Deep Neural Audio Watermarks For QR Code IJCAI 2023 Github - - Imperceptible QR-codes in audio for the visually impaired

3.2 Audio Synthesis Datasets

Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
ASVspoof 2021 Challenge - 2021 Github Arxiv - Challenge for audio spoofing detection
ADD 2022: the first Audio Deep Synthesis Detection Challenge ICASSP 2022 Github Arxiv - Official Chinese challenge website (NO HTTPS!)

3.3 News on Audio Watermarking

3.4 Further Links on Audio Synthesis and Detection

4. Text Domain

Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models - 2023 Github Arxiv -
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding S&P 2021 Github Arxiv -
Resilient Watermarking for LLM-Generated Codes - 2024 Github Appendix Arxiv - Code
Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code - 2024 - Arxiv - Error correction
Provable Robust Watermarking for AI-Generated Text ICLR 2024 Github Arxiv - Apparently good and robust LLM Watermarking
Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs ICLR 2024 Github Arxiv - TODO

5. Related News


6. Generative Model stealing Papers


Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
Stealing Machine Learning Models: Attacks and Countermeasures for Generative Adversarial Networks ACSAC 2021 - Arxiv -
Model Extraction Attack and Defense on Deep Generative Models Journal of Physics 2022 - - -
Model Extraction and Defenses on Generative Adversarial Networks - 2021 - Arxiv -

7. Survey Papers


Paper Proceedings / Journal Venue Year / Last Updated Code Alternative PDF Source Notes
A Comprehensive Survey on Robust Image Watermarking Neurocomputing 2022 - Arxiv - Not about model rooting
A Systematic Review on Model Watermarking for Neural Networks Frontiers in Big Data 2021 - Arxiv - Not about model rooting
A Comprehensive Review on Digital Image Watermarking - 2022 - Arxiv - Not about model rooting
Copyright Protection in Generative AI: A Technical Perspective - 2024 - Arxiv - About IP protection in GenAI in general
Security and Privacy on Generative Data in AIGC: A Survey - 2023 - Arxiv - About security aspects in GenAI in general
Detecting Multimedia Generated by Large AI Models: A Survey - 2024 - Arxiv - About detecting GenAI in general
Audio Deepfake Detection: A Survey - 2023 - Arxiv - Contains overview of spoofed audio datasets, spoofing methods, and detection methods
- Very good servey

Summarization of the systematization given in this review.

Taxonomy

  • Embedding method
    • Watermark in model parameters
    • Trigger-Watermark-Backdoor
  • Verification access
    • Whitebox (access model parameters)
    • Blackbox (access via API)
  • Capacity
    • Zero-bit (is watermark exists)
    • Multi-Bit (watermark contains arbitrary info)
  • Authentication
    • Model is watermarked
    • By whom model is watermarked
  • Uniqueness
    • All model instances carry same watermark
    • Different model instances carry different watermarks

Requirements & Security Goals

Goal Explaination Motivation
Fidelity High prediction quality on original tasks model performance shouldn't significantly degrade
Robustness Watermark should resist removal protects against copyright evasion
Reliability Minimal false negatives ensures rightful ownership is recognized
Integrity Minimal false positives prevents wrongful accusations of theft
Capacity Supports large information amounts allows comprehensive watermarks
Secrecy Watermark must be secret and undetectable prevents unauthorized detection
Efficiency Fast watermark insertion and verification avoids computational burden
Generality Independent of datasets and ML algorithms facilitates widespread application

Threat Model

  • Attacker Knowledge:
    1. existence of the watermark
    2. model and its parameters
    3. watermarking scheme used
    4. (parts of) the training data
    5. (parts of) the watermark itself or the trigger dataset
  • Attacker Capabilities (irrelevant)
    • passive (eavesdropping)
    • active (interaction)
  • Attacker Objectives
    • For what is model being used by the attacker? (rather unspecific)

Attacks against Watermarking

  • Watermark Detection (weakest)
  • Watermark Suppression, i.e. avoid watermark verification
    • e.g. dissimulating any presence of a watermark in the model parameters and behavior
    • e.g. suppressing the reactions of the model to the original watermark trigger
  • Watermark Forging
    1. Recovering the legitimate owner’s watermark and claiming ownership (if there is no binding between the watermark and the owner)
    2. Adding a new watermark that creates ambiguity concerning ownership
    3. Identifying a fake watermark within the model that coincidentally acts like a real watermark but actually is not
  • Watermark Overwriting
    1. Adding Watermark to model with deactivating old one (strong)
    2. Adding Watermark to model without deactivating old one (weak)
  • Watermark Removal
    1. depends on the presence of a watermark
    2. depends on the underlying watermarking scheme
    3. depends on availability of additional data, e.g. for fine-tuning or retraining
      • Methods
        • Fine-Tuning
        • Pruning
        • Quantization
        • Distillation
        • Transfer-Learning
        • Backdoor Removal

Categorizing Watermarking Methods

  • Embedding Watermarks into Model Parameters
    • Adding patterns into model which can be verified locally
  • Using Pre-Defined Inputs as Triggers
    • Adding behaviour triggered by special input
  • Using Model Fingerprints to Identify Potentially Stolen Instances
    • No additional action needed, just recognizing a model based on some criteria