Pinned Repositories
alignment-attribution-code
Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
misalignment
ledllm
Chinese-Mistral
Chinese-Mistral: An Efficient and Effective Chinese Large Language Model
FigStep
[AAAI'25] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
LLM101n
LLM101n: Let's build a Storyteller
LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
YichenBC's Repositories
YichenBC/LLM101n
LLM101n: Let's build a Storyteller
YichenBC/LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.