Pinned Repositories
agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
diffusion_denoised_smoothing
Certified robustness "for free" using off-the-shelf diffusion models and classifiers
lm-extraction-benchmark-data
Datasets for the SATML 2023 competition on training data extraction
lm_memorization_data
Data for "Quantifying Memorization Across Neural Language Models"
realistic-adv-examples
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
robust-style-mimicry
satml-llm-ctf
Code used to run the platform for the LLM CTF colocated with SaTML 2024
superhuman-ai-consistency
SPY Lab's Repositories
ethz-spylab/rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
ethz-spylab/agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
ethz-spylab/rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
ethz-spylab/diffusion_denoised_smoothing
Certified robustness "for free" using off-the-shelf diffusion models and classifiers
ethz-spylab/robust-style-mimicry
ethz-spylab/superhuman-ai-consistency
ethz-spylab/satml-llm-ctf
Code used to run the platform for the LLM CTF colocated with SaTML 2024
ethz-spylab/realistic-adv-examples
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
ethz-spylab/lm_memorization_data
Data for "Quantifying Memorization Across Neural Language Models"
ethz-spylab/lm-extraction-benchmark-data
Datasets for the SATML 2023 competition on training data extraction
ethz-spylab/misleading-privacy-evals
Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
ethz-spylab/infoseclab_23
ethz-spylab/.github
ethz-spylab/data-decay
Playing around with the CC3M data
ethz-spylab/llm_lab
ethz-spylab/privacy
Library for training machine learning models with privacy for training data
ethz-spylab/ctf-satml24-data-analysis