mo666666's Stars
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
ytongbai/LVM
MadryLab/photoguard
Raising the Cost of Malicious AI-Powered Image Editing
psyker-team/mist
Watermark you artworks to stay away from unauthorized diffusion style mimicry!
JailbreakBench/jailbreakbench
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
LLM-Tuning-Safety/LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
chs20/RobustVLM
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
arobey1/smooth-llm
chujiezheng/LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
huanranchen/DiffusionClassifier
Official code implement of Robust Classification via a Single Diffusion Model
YuxinWenRick/diffusion_memorization
Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)
ys-zong/VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
sen-mao/SuppressEOT
Official Implementations "Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models" (ICLR2024)
TreeLLi/APT
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
erfanshayegani/Jailbreak-In-Pieces
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
weizeming/SAM_AT
SchwinnL/LLM_Embedding_Attack
Code to conduct an embedding attack on LLMs
AISG-Technology-Team/GCSS-Track-1A-Submission-Guide
Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).
Jayfeather1024/Backdoor-Enhanced-Alignment
Robin-WZQ/T2IShield
[ECCV24] T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
PKU-ML/Diffusion-PID-Protection
renjie3/MemAttn
PKU-ML/TMLlib
A Trustworthy Machine Learning Algorithm Library
Huang-yihao/Personalization-based_backdoor
PKU-ML/TERD
TERD: A Framework for Backdoor Detection on Diffusion Model
PKU-ML/ReBAT
Official PyTorch implementation of ReBAT (ReBalanced Adversarial Training) in NeurIPS 2023 "Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective".
mo666666/TERD
TERD: A Framework for Backdoor Detection on Diffusion Model
SPIN-UMass/MeanSparse
tenghuilee/ContrastDiffPurification