vokhanhan25's Stars
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
jacobgil/pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
poloclub/transformer-explainer
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
Harry24k/adversarial-attacks-pytorch
PyTorch implementation of adversarial attacks [torchattacks]
salesforce/ALBEF
Code for ALBEF: a new vision-language pre-training method
IDEA-Research/Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
facebookresearch/TorchRay
Understanding Deep Networks via Extremal Perturbations and Smooth Masks
TIGER-AI-Lab/Program-of-Thoughts
Data and Code for Program of Thoughts (TMLR 2023)
mattneary/attention
visualizing attention for LLM users
yunqing-me/AttackVLM
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
veronica320/Faithful-COT
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
kevinzakka/clip_playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
gordonhu608/MQT-LLaVA
[NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models
Cogito2012/CarCrashDataset
[ACM MM 2020] CCD dataset for traffic accident anticipation.
zjysteven/VLM-Visualizer
Visualizing the attention of vision-language models
IBM/ZOO-Attack
Codes for reproducing the black-box adversarial attacks in “ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models,” ACM CCS Workshop on AI-Security, 2017
yu-rp/apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
IntelLabs/lvlm-interpret
as791/ZOO_Attack_PyTorch
This repository contains the PyTorch implementation of Zeroth Order Optimization Based Adversarial Black Box Attack (https://arxiv.org/abs/1708.03999)
euanong/image-hijacks
Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
QUVA-Lab/PIN
Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
SiyuanWangw/ULogic
Ziwei-Zheng/LVLM-Stethoscope
A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).
xiangyu-mm/UniFashion
The official code for paper "UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation"
RobbieHolland/SpecialistVLMs
Developing VLMs for expert-level performance in specific medical specialties
xuanmingcui/visual_adversarial_lmm
ChaduCheng/LVLMs_Exploring