Lauch1ng's Stars
WenjunHuang94/ML-Mamba
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2
CircleRadon/TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
ShiArthur03/ShiArthur03
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Lauch1ng/LKRobust
TRI-ML/vlm-evaluation
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
MzeroMiko/VMamba
VMamba: Visual State Space Models,code is based on mamba
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
kyegomez/MultiModalMamba
A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance Multi-Modal Model. Powered by Zeta, the simplest AI framework ever.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
microsoft/SimMIM
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
A-LinCui/Adversarial_Patch_Attack
Pytorch implementation of Adversarial Patch on ImageNet (arXiv: https://arxiv.org/abs/1712.09665)
Muzammal-Naseer/IPViT
Official repository for "Intriguing Properties of Vision Transformers" (NeurIPS 2021--Spotlight)
facebookresearch/ConvNeXt-V2
Code release for ConvNeXt V2 model
jianlong-yuan/UniNeXt
AbrahamYabo/SdAE
openai/guided-diffusion
Visual-Attention-Network/VAN-Classification