tsaoni

tsaoni's Stars

LLM-Tuning-Safety/LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
Language:Python21822
asahi417/lmppl
Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder LM (eg. Flan-T5).
Language:Python1229
zhxieml/remiss-jailbreak
Language:Python20
XuandongZhao/weak-to-strong
Weak-to-Strong Jailbreaking on Large Language Models
Language:Python628
vinusankars/BEAST
Implementation of BEAST adversarial attack for language models (ICML 2024)
Language:Python723
montemac/activation_additions
Algebraic value editing in pretrained language models
Language:Jupyter Notebook5416
facebookresearch/advprompter
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
Language:Python1088
0xk1h0/ChatGPT_DAN
ChatGPT DAN, Jailbreaks prompt
6.3k586
RICommunity/TAP
TAP: An automated jailbreaking method for black-box LLMs
Language:Python10617
huizhang-L/CodeChameleon
Language:Python121
centerforaisafety/tdc2023-starter-kit
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
Language:Python7726
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Language:Python4.5k344
chawins/pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Language:Python444
chawins/llm-sp
Papers and resources related to the security and privacy of LLMs 🤖
Language:Python38330
google-research/google-research
Google Research
Language:Jupyter Notebook33.8k7.8k
Soulter/hugging-chat-api
HuggingChat Python API🤗
Language:Python817118
keirp/automatic_prompt_engineer
Language:Python1.1k144
FreedomIntelligence/ReasoningNLP
paper list on reasoning in NLP
16916
subeeshvasu/Awesome-Learning-with-Label-Noise
A curated list of resources for Learning with Noisy Labels
2.6k352
claws-lab/few-shot-adversarial-robustness
Code for ACL'23 Findings paper on 'Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding'
Language:Python51
mingkaid/rl-prompt
Accompanying repo for the RLPrompt paper
Language:Python29253
allenai/RL4LMs
A modular RL library to fine-tune language models to human preferences
Language:Python2.2k191
llm-attacks/llm-attacks
Universal and Transferable Attacks on Aligned Language Models
Language:Python3.3k457
google-research/prompt-tuning
Original Implementation of Prompt Tuning from Lester, et al, 2021
Language:Python64156
facebookresearch/text-adversarial-attack
Repo for arXiv preprint "Gradient-based Adversarial Attacks against Text Transformers"
Language:Python9811
XiangLi1999/Diffusion-LM
Diffusion-LM
Language:Python1k135
Phantivia/T-PGD
[Findings of ACL 2023] Bridge the Gap Between CV and NLP! A Optimization-based Textual Adversarial Attack Framework.
Language:Python12
uber-research/PPLM
Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
Language:Python1.1k202
TobeyYang/StyleDGPT
The code for ``STYLEDGPT: Stylized Response Generation with Pre-trained LanguageModels'' (Findings of EMNLP2020)
Language:Python217
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook46.7k5.5k