Raytsang123

Ph.D. student working on AI security

Zhejiang UniversityHangzhou, Zhejiang

Raytsang123's Stars

casperllm/CASPER
Language:Jupyter Notebook102
ZhentingWang/LatentTracer
Language:Python172
chichidd/llm-lora-trojan
Code for paper "The Philosopher’s Stone: Trojaning Plugins of Large Language Models"
Language:Python41
kangmintong/C-RAG
[ICML 2024] Codes for C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
Language:Python101
KaiyuanZh/censor
[NDSS 2025] CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling
1
inspire-group/RobustRAG
Language:Python51
microsoft/TaskTracker
TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a simple linear probe-based method and a more sophisticated metric learning method to achieve this. The project also releases the computationally expensive activation data to stimulate further AI safety research.
Language:Jupyter Notebook271
DataSmithLab/Moderator
Language:Python31
ethz-spylab/agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Language:Jupyter Notebook497
Tingwei-Zhang/Soft-Prompts-Go-Hard
code base for paper "Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions"
Language:Python2
ebagdasa/adversarial_illusions
Code for "Adversarial Illusions in Multi-Modal Embeddings"
Language:Jupyter Notebook121
zonghaohuang007/ML_data_auditing
The official code repo for CCS 2024 paper: ``A general framework for data-use auditing of ML models''
Language:Python3
qinghua-zhou/stealth-edits
Stealth edits to large language models
Language:Jupyter Notebook3
zou-group/textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
Language:Python1.6k133
SaFoLab-WISC/Awesome-T2I-safety-Papers
List of T2I safety papers, updated daily, welcome to discuss using Discussions
391
zhangrui4041/Instruction_Backdoor_Attack
Language:Python8
sleeepeer/PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
Language:Python739
ZhangZhuoSJTU/LINT
Language:Python10
Shawn-Shan/nightshade-release
Research code release for the Nightshade project from University of Chicago
Language:Python146
2019ChenGong/Offline_RL_Poisoner
Replication Package for "Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets", IEEE S&P 2024.
Language:Python213
lancopku/agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
Language:Python342
pasquini-dario/LLM_NeuralExec
Code to generate NeuralExecs (prompt injection for LLMs)
Language:Jupyter Notebook142
tydusky/remasker
Language:Python102
WUSTL-CSPL/LLMJailbreak
Language:HTML13
llm-platform-security/SecGPT
SecGPT: An execution isolation architecture for LLM-based systems
Language:Python427
T1aNS1R/Evil-Geniuses
Language:Python631
zjunlp/LLMAgentPapers
Must-read Papers on LLM Agents.
1.7k92
AI4Good24/PsySafe
291
KaiyuanZh/OrthogLinearBackdoor
[IEEE S&P 2024] Exploring the Orthogonality and Linearity of Backdoor Attacks
Language:Python16
SolidShen/RIPPLE_official
Language:Python203

Raytsang123

Raytsang123's Stars

casperllm/CASPER

ZhentingWang/LatentTracer

chichidd/llm-lora-trojan

kangmintong/C-RAG

KaiyuanZh/censor

inspire-group/RobustRAG

microsoft/TaskTracker

DataSmithLab/Moderator

ethz-spylab/agentdojo

Tingwei-Zhang/Soft-Prompts-Go-Hard

ebagdasa/adversarial_illusions

zonghaohuang007/ML_data_auditing

qinghua-zhou/stealth-edits

zou-group/textgrad

SaFoLab-WISC/Awesome-T2I-safety-Papers

zhangrui4041/Instruction_Backdoor_Attack

sleeepeer/PoisonedRAG

ZhangZhuoSJTU/LINT

Shawn-Shan/nightshade-release

2019ChenGong/Offline_RL_Poisoner

lancopku/agent-backdoor-attacks

pasquini-dario/LLM_NeuralExec

tydusky/remasker

WUSTL-CSPL/LLMJailbreak

llm-platform-security/SecGPT

T1aNS1R/Evil-Geniuses

zjunlp/LLMAgentPapers

AI4Good24/PsySafe

KaiyuanZh/OrthogLinearBackdoor

SolidShen/RIPPLE_official