aounon

Postdoctoral researcher at Harvard University working in AI safety and robustness.

Harvard UniversityBoston, MA

Pinned Repositories

aounon.github.io
Language:HTML0 1 00
AutoDAN
The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".
Language:Python0 0 00
cdf-smoothing
Language:Python3 2 00
center-smoothing
Language:Python8 1 01
certified-llm-safety
Language:Python25 3 07
distributional-robustness
Language:Python3 1 00
llm-attacks
Universal and Transferable Attacks on Aligned Language Models
Language:Python0 0 00
llm-rank-optimizer
Language:Python76 6 119
AutoDAN
The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".
Language:Python218 5 1534
Reliability-of-AI-text-detectors
Can AI-Generated Text be Reliably Detected?
Language:Python54 2 22

aounon's Repositories

aounon/llm-rank-optimizer
Language:Python76 6 119
aounon/certified-llm-safety
Language:Python25 3 07
aounon/center-smoothing
Language:Python8 1 01
aounon/cdf-smoothing
Language:Python3 2 00
aounon/distributional-robustness
Language:Python3 1 00
aounon/aounon.github.io
Language:HTML0 1 00
aounon/AutoDAN
The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".
Language:Python0 0 00
aounon/llm-attacks
Universal and Transferable Attacks on Aligned Language Models
Language:Python0 0 00