boyiwei/alignment-attribution-code
Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
PythonMIT
Stargazers
- AAAAAAsuka
- aashiqmuhamedCarnegie Mellon University
- ain-sophAmazon Inc
- amadeuzou
- Bonj0urPeking University; Wuhan University
- boyiweiPrinceton University
- chrisyxueUESTC
- clawnotfound
- ethanyiwuUniversity of Wisconsin - Madison / HKUST
- fly-dustUniversity of Washington
- HaitaoMaoMichigan State University
- HamLaertes
- Hazelsuko07
- huangtianshengGeorgia Institute of Technology
- HyperwjfInstitute of Automation, Chinese Academy of Sciences
- janweh
- jc-ryanUniversity of Chinese Academy of Sciences
- jiaxiaojunQAQNanyang Technological University
- kunatoKUNANA AI
- meet-cjli
- mob-scu四川成都
- MoonRide303Poland
- nannullnaSamsung Research
- ohaijenVienna, Austria
- pkuliumiastate
- Princeton-SysML
- QwertyJacobInsubria University
- SeuperHakkerJa@BatsResearch
- UnispacPrinceton ECE
- wyxscirbeijing
- xiaohuasuan
- xszheng2020
- yechao-zhangHuazhong University of Science and Technology
- zixuan-wang-dlt
- ZJUWYH
- zzxxxl