SPY Lab

Secure and Private AI research at ETH Zürich

Switzerland

Pinned Repositories

agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Language:Jupyter Notebook74 4 513
diffusion_denoised_smoothing
Certified robustness "for free" using off-the-shelf diffusion models and classifiers
Language:Python36 2 45
misleading-privacy-evals
Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
Language:Jupyter Notebook8 1 02
realistic-adv-examples
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
Language:Python19 2 10
rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
Language:Python43 2 88
rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
Language:Python110 4 79
robust-style-mimicry
Language:Python32 3 11
satml-llm-ctf
Code used to run the platform for the LLM CTF colocated with SaTML 2024
Language:Python26 9 555
superhuman-ai-consistency
Language:Python28 0 02
unlearning-vs-safety
Language:Python153

SPY Lab's Repositories

ethz-spylab/rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
Language:Python110 4 79
ethz-spylab/agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Language:Jupyter Notebook74 4 513
ethz-spylab/rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
Language:Python43 2 88
ethz-spylab/diffusion_denoised_smoothing
Certified robustness "for free" using off-the-shelf diffusion models and classifiers
Language:Python36 2 45
ethz-spylab/robust-style-mimicry
Language:Python32 3 11
ethz-spylab/superhuman-ai-consistency
Language:Python28 0 02
ethz-spylab/satml-llm-ctf
Code used to run the platform for the LLM CTF colocated with SaTML 2024
Language:Python26 9 555
ethz-spylab/realistic-adv-examples
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
Language:Python19 2 10
ethz-spylab/unlearning-vs-safety
Language:Python153
ethz-spylab/misleading-privacy-evals
Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
Language:Jupyter Notebook8 1 02
ethz-spylab/lm_memorization_data
Data for "Quantifying Memorization Across Neural Language Models"
7 0 20
ethz-spylab/lm-extraction-benchmark-data
Datasets for the SATML 2023 competition on training data extraction
5 0 10
ethz-spylab/non-adversarial-reproduction
Official code for "Measuring Non-Adversarial Reproduction of Training Data in Large Language Models" (https://arxiv.org/abs/2411.10242)
Language:Jupyter Notebook50
ethz-spylab/infoseclab_23
Language:Python1 1 00
ethz-spylab/vmi-retreat-workshop-2024
Repository for the VMI Summer Retreat Workshop on Hacking AI Agents
Language:Python1 2 0
ethz-spylab/.github
0 2 00
ethz-spylab/data-decay
Playing around with the CC3M data
Language:Python0 1 00
ethz-spylab/llm_lab
Language:Python0 0 00
ethz-spylab/privacy
Library for training machine learning models with privacy for training data
Language:Python0 0 00
ethz-spylab/Blind-MIA
This is the official code for Blind Baselines Beat Membership Inference Attacks for Foundation Models
Language:Python
ethz-spylab/ctf-satml24-data-analysis
Language:Python1 0

SPY Lab

Pinned Repositories

agentdojo

diffusion_denoised_smoothing

misleading-privacy-evals

realistic-adv-examples

rlhf-poisoning

rlhf_trojan_competition

robust-style-mimicry

satml-llm-ctf

superhuman-ai-consistency

unlearning-vs-safety

SPY Lab's Repositories

ethz-spylab/rlhf_trojan_competition

ethz-spylab/agentdojo

ethz-spylab/rlhf-poisoning

ethz-spylab/diffusion_denoised_smoothing

ethz-spylab/robust-style-mimicry

ethz-spylab/superhuman-ai-consistency

ethz-spylab/satml-llm-ctf

ethz-spylab/realistic-adv-examples

ethz-spylab/unlearning-vs-safety

ethz-spylab/misleading-privacy-evals

ethz-spylab/lm_memorization_data

ethz-spylab/lm-extraction-benchmark-data

ethz-spylab/non-adversarial-reproduction

ethz-spylab/infoseclab_23

ethz-spylab/vmi-retreat-workshop-2024

ethz-spylab/.github

ethz-spylab/data-decay

ethz-spylab/llm_lab

ethz-spylab/privacy

ethz-spylab/Blind-MIA

ethz-spylab/ctf-satml24-data-analysis