YiZeng623

RS intern @ Meta AI | Ph.D. @ Virginia Tech | M.S. @ UCSD | Previous Intern @ Sony AI

San Diego

Pinned Repositories

persuasive_jailbreaker
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
Language:HTML236 4 415
LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
Language:Python222 4 624
Meta-Sift
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on poisoned dataset.
Language:Python17 2 04
Narcissus
The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% attack success rate.
Language:Python100 2 1011
Adaptive-5G-IIoT-Backdoor-Detection
Official implementation of the IEEE TII paper: 'Adaptive Backdoor Trigger Detection in Edge-Deployed DNNs in 5G-Enabled IIoT Systems'
0 1 00
Advanced-Gradient-Obfuscating
Take further steps in the arms race of adversarial examples with only preprocessing.
Language:Jupyter Notebook35 8 21
DeepSweep
An evaluation framework for mitigating DNN backdoor attacks using data augmentations
Language:Jupyter Notebook8 1 11
FenceBox
The official FenceBox Platform -- implementation of paper:' FenceBox A Platform for Defeating Adversarial Examples with Data Augmentation Techniques.'
Language:Jupyter Notebook2 1 01
frequency-backdoor
ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Repo. explores how we can use these artifacts to develop stronger backdoor defenses and attacks.
Language:Jupyter Notebook38 1 46
I-BAU
Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''
Language:Jupyter Notebook49 2 313

YiZeng623's Repositories

YiZeng623/I-BAU
Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''
Language:Jupyter Notebook49 2 313
YiZeng623/frequency-backdoor
ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Repo. explores how we can use these artifacts to develop stronger backdoor defenses and attacks.
Language:Jupyter Notebook38 1 46
YiZeng623/Advanced-Gradient-Obfuscating
Take further steps in the arms race of adversarial examples with only preprocessing.
Language:Jupyter Notebook35 8 21
YiZeng623/DeepSweep
An evaluation framework for mitigating DNN backdoor attacks using data augmentations
Language:Jupyter Notebook8 1 11
YiZeng623/FenceBox
The official FenceBox Platform -- implementation of paper:' FenceBox A Platform for Defeating Adversarial Examples with Data Augmentation Techniques.'
Language:Jupyter Notebook2 1 01
YiZeng623/Adaptive-5G-IIoT-Backdoor-Detection
Official implementation of the IEEE TII paper: 'Adaptive Backdoor Trigger Detection in Edge-Deployed DNNs in 5G-Enabled IIoT Systems'
0 1 00
YiZeng623/backdoor-learning-resources
A curated list of backdoor learning resources
0 0 00
YiZeng623/BackdoorBox
Language:Python0 0 00
YiZeng623/Persuasive-LLM-Jailbreak.github.io
Language:HTML0 1 00
YiZeng623/cvpr-latex-template
Extended LaTeX template for CVPR/ICCV papers
Language:TeX0 0
YiZeng623/DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
Language:Python0 0
YiZeng623/Meta-Sift
The official implementation of Meta-Sift -- Ten minutes or less to find a 1000-size or larger clean subset on any poisoned dataset.
Language:Python0 0
YiZeng623/NAD
This is an implementation demo of the ICLR 2021 paper [Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks](https://openreview.net/pdf?id=9l0K4OM-oXE) in PyTorch.
YiZeng623/Narcissus-backdoor-attack
The official implementation of Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% attack success rate.
Language:Python0 0
YiZeng623/requirements
Simple requirements.txt based example
Language:Jupyter Notebook0 0
YiZeng623/Universal_Pert_Cert
This repo is the official implementation of the ICLR'23 paper "Towards Robustness Certification Against Universal Perturbations." We calculate the certified robustness against universal perturbations (UAP/ Backdoor) given a trained model.
YiZeng623/YiZeng623.github.io
Language:CSS0 0

YiZeng623

Pinned Repositories

persuasive_jailbreaker

LLMs-Finetuning-Safety

Meta-Sift

Narcissus

Adaptive-5G-IIoT-Backdoor-Detection

Advanced-Gradient-Obfuscating

DeepSweep

FenceBox

frequency-backdoor

I-BAU

YiZeng623's Repositories

YiZeng623/I-BAU

YiZeng623/frequency-backdoor

YiZeng623/Advanced-Gradient-Obfuscating

YiZeng623/DeepSweep

YiZeng623/FenceBox

YiZeng623/Adaptive-5G-IIoT-Backdoor-Detection

YiZeng623/backdoor-learning-resources

YiZeng623/BackdoorBox

YiZeng623/Persuasive-LLM-Jailbreak.github.io

YiZeng623/cvpr-latex-template

YiZeng623/DecodingTrust

YiZeng623/Meta-Sift

YiZeng623/NAD

YiZeng623/Narcissus-backdoor-attack

YiZeng623/requirements

YiZeng623/Universal_Pert_Cert

YiZeng623/YiZeng623.github.io