zzhanghub's Stars
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
victorsungo/MMDialog
The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Belval/TextRecognitionDataGenerator
A synthetic data generator for text recognition
Sierkinhane/CRNN_Chinese_Characters_Rec
(CRNN) Chinese Characters Recognition.
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
emu1729/GIST
Generating Image Specific Text
zhjohnchan/SK-VG
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
shikras/d-cube
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
benywon/ChiQA
The implementations of various baselines in our CIKM 2022 paper: ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding.
huggingface/OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
V3Det/V3Det
shikras/shikra
zhaoyucs/VSD
Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"
baichuan-inc/Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
TencentARC/Mix-of-Show
NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
showlab/VisorGPT
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
EmbodiedGPT/EmbodiedGPT_Pytorch
InternLM/InternLM-techreport
yukezhu/visual7w-toolkit
Toolkit for Visual7W visual question answering dataset
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Hxyou/IdealGPT
Official Code of IdealGPT
RUCAIBox/POPE
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
yuhangzang/ContextDET
Contextual Object Detection with Multimodal Large Language Models
YifanXu74/MQ-Det
Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)
Jingkang50/OpenPSG
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
TencentARC/GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
luogen1996/LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
yxuansu/PandaGPT
[TLLM'23] PandaGPT: One Model To Instruction-Follow Them All