Raytsang123's Stars
casperllm/CASPER
ZhentingWang/LatentTracer
chichidd/llm-lora-trojan
Code for paper "The Philosopher’s Stone: Trojaning Plugins of Large Language Models"
kangmintong/C-RAG
[ICML 2024] Codes for C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
KaiyuanZh/censor
[NDSS 2025] CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling
inspire-group/RobustRAG
microsoft/TaskTracker
TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a simple linear probe-based method and a more sophisticated metric learning method to achieve this. The project also releases the computationally expensive activation data to stimulate further AI safety research.
DataSmithLab/Moderator
ethz-spylab/agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Tingwei-Zhang/Soft-Prompts-Go-Hard
code base for paper "Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions"
ebagdasa/adversarial_illusions
Code for "Adversarial Illusions in Multi-Modal Embeddings"
zonghaohuang007/ML_data_auditing
The official code repo for CCS 2024 paper: ``A general framework for data-use auditing of ML models''
qinghua-zhou/stealth-edits
Stealth edits to large language models
zou-group/textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
SaFoLab-WISC/Awesome-T2I-safety-Papers
List of T2I safety papers, updated daily, welcome to discuss using Discussions
zhangrui4041/Instruction_Backdoor_Attack
sleeepeer/PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
ZhangZhuoSJTU/LINT
Shawn-Shan/nightshade-release
Research code release for the Nightshade project from University of Chicago
2019ChenGong/Offline_RL_Poisoner
Replication Package for "Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets", IEEE S&P 2024.
lancopku/agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
pasquini-dario/LLM_NeuralExec
Code to generate NeuralExecs (prompt injection for LLMs)
tydusky/remasker
WUSTL-CSPL/LLMJailbreak
llm-platform-security/SecGPT
SecGPT: An execution isolation architecture for LLM-based systems
T1aNS1R/Evil-Geniuses
zjunlp/LLMAgentPapers
Must-read Papers on LLM Agents.
AI4Good24/PsySafe
KaiyuanZh/OrthogLinearBackdoor
[IEEE S&P 2024] Exploring the Orthogonality and Linearity of Backdoor Attacks
SolidShen/RIPPLE_official