LLM-Tuning-Safety/LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
PythonMIT
Issues
- 0
- 1
- 2
temp not zero during inference
#7 opened by ShengYun-Peng - 1
- 1
How the pure_bad_dataset was created??
#4 opened by lihkinVerma - 1
SafeTensors issue
#3 opened by lihkinVerma - 1