aounon
Postdoctoral researcher at Harvard University working in AI safety and robustness.
Harvard UniversityBoston, MA
Pinned Repositories
aounon.github.io
AutoDAN
The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".
cdf-smoothing
center-smoothing
certified-llm-safety
distributional-robustness
llm-attacks
Universal and Transferable Attacks on Aligned Language Models
llm-rank-optimizer
AutoDAN
The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".
Reliability-of-AI-text-detectors
Can AI-Generated Text be Reliably Detected?
aounon's Repositories
aounon/llm-rank-optimizer
aounon/certified-llm-safety
aounon/center-smoothing
aounon/cdf-smoothing
aounon/distributional-robustness
aounon/aounon.github.io
aounon/AutoDAN
The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".
aounon/llm-attacks
Universal and Transferable Attacks on Aligned Language Models