YiZeng623
RS intern @ Meta AI | Ph.D. @ Virginia Tech | M.S. @ UCSD | Previous Intern @ Sony AI
San Diego
Pinned Repositories
persuasive_jailbreaker
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
Meta-Sift
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on poisoned dataset.
Narcissus
The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% attack success rate.
Adaptive-5G-IIoT-Backdoor-Detection
Official implementation of the IEEE TII paper: 'Adaptive Backdoor Trigger Detection in Edge-Deployed DNNs in 5G-Enabled IIoT Systems'
Advanced-Gradient-Obfuscating
Take further steps in the arms race of adversarial examples with only preprocessing.
DeepSweep
An evaluation framework for mitigating DNN backdoor attacks using data augmentations
FenceBox
The official FenceBox Platform -- implementation of paper:' FenceBox A Platform for Defeating Adversarial Examples with Data Augmentation Techniques.'
frequency-backdoor
ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Repo. explores how we can use these artifacts to develop stronger backdoor defenses and attacks.
I-BAU
Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''
YiZeng623's Repositories
YiZeng623/I-BAU
Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''
YiZeng623/frequency-backdoor
ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Repo. explores how we can use these artifacts to develop stronger backdoor defenses and attacks.
YiZeng623/Advanced-Gradient-Obfuscating
Take further steps in the arms race of adversarial examples with only preprocessing.
YiZeng623/DeepSweep
An evaluation framework for mitigating DNN backdoor attacks using data augmentations
YiZeng623/FenceBox
The official FenceBox Platform -- implementation of paper:' FenceBox A Platform for Defeating Adversarial Examples with Data Augmentation Techniques.'
YiZeng623/Adaptive-5G-IIoT-Backdoor-Detection
Official implementation of the IEEE TII paper: 'Adaptive Backdoor Trigger Detection in Edge-Deployed DNNs in 5G-Enabled IIoT Systems'
YiZeng623/backdoor-learning-resources
A curated list of backdoor learning resources
YiZeng623/BackdoorBox
YiZeng623/Persuasive-LLM-Jailbreak.github.io
YiZeng623/cvpr-latex-template
Extended LaTeX template for CVPR/ICCV papers
YiZeng623/DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
YiZeng623/Meta-Sift
The official implementation of Meta-Sift -- Ten minutes or less to find a 1000-size or larger clean subset on any poisoned dataset.
YiZeng623/NAD
This is an implementation demo of the ICLR 2021 paper [Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks](https://openreview.net/pdf?id=9l0K4OM-oXE) in PyTorch.
YiZeng623/Narcissus-backdoor-attack
The official implementation of Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% attack success rate.
YiZeng623/requirements
Simple requirements.txt based example
YiZeng623/Universal_Pert_Cert
This repo is the official implementation of the ICLR'23 paper "Towards Robustness Certification Against Universal Perturbations." We calculate the certified robustness against universal perturbations (UAP/ Backdoor) given a trained model.
YiZeng623/YiZeng623.github.io