LLM-Tuning-Safety/LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
PythonMIT
Stargazers
- achuthasubhashGUNTUR , INDIA
- aiwenForGit
- alex-orkuma
- AshleyAlexJacobK2x.tech
- ChrisAdaline
- dikshieKeio University
- dsdanielparkSeoul
- dvsrepoargilla.io
- fatal0
- fly51flyPRIS
- hmgxr128
- keeganmccallumMaking magic at Luma Labs
- lin-rogerTaiwan
- LLM-Tuning-Safety
- Luke-SkycrawlerUofU
- MalindaHPrinceton University
- max-andrEPFL
- MemorySlicesPrinceton University
- Metric-Void@princeton @vworks-cc
- MisterPANDCChongqing University
- nasa03Neo-Tokyo
- nirav0999University of Illinois Urbana-Champaign
- REDSgnaoh
- SandalotsVolcanak
- ShlomoSteptNYC Area
- shuyhereKing Abdullah University of Science and Technology,Univercity of Macau,UfishAI
- teowuRhymes AI Singapore
- THUYimingLiZhejiang University
- tugberkbastepeKyndryl (a spin-off of IBM)
- UnispacPrinceton ECE
- unverciftciMath & AI Institute
- vtu81Princeton University
- wzhlin
- xaddwell
- YiZeng623San Diego
- zwhe99Shanghai Jiao Tong University