zzhanghub

zzhanghub's Stars

QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Language:Python15.3k1.2k
victorsungo/MMDialog
The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Language:Python1927
Belval/TextRecognitionDataGenerator
A synthetic data generator for text recognition
Language:Python3.4k990
Sierkinhane/CRNN_Chinese_Characters_Rec
(CRNN) Chinese Characters Recognition.
Language:Python1.8k541
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python16.9k1.7k
emu1729/GIST
Generating Image Specific Text
Language:Python261
zhjohnchan/SK-VG
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
291
shikras/d-cube
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Language:Python1117
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Language:Python4.5k474
benywon/ChiQA
The implementations of various baselines in our CIKM 2022 paper: ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding.
Language:Python311
huggingface/OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
Language:Python1939
V3Det/V3Det
Language:Python1032
shikras/shikra
Language:Python75545
zhaoyucs/VSD
Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"
Language:Python263
baichuan-inc/Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
Language:Python5.7k508
OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Language:Python48136
TencentARC/Mix-of-Show
NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Language:Python41020
showlab/VisorGPT
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
Language:Python1343
EmbodiedGPT/EmbodiedGPT_Pytorch
Language:Python34735
InternLM/InternLM-techreport
90025
yukezhu/visual7w-toolkit
Toolkit for Visual7W visual question answering dataset
Language:Python7518
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
13.5k856
Hxyou/IdealGPT
Official Code of IdealGPT
Language:Python328
RUCAIBox/POPE
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
Language:Python1926
yuhangzang/ContextDET
Contextual Object Detection with Multimodal Large Language Models
Language:Python2115
YifanXu74/MQ-Det
Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)
Language:Python27813
Jingkang50/OpenPSG
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
Language:Python43269
TencentARC/GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
Language:Python56
luogen1996/LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
Language:Python51338
yxuansu/PandaGPT
[TLLM'23] PandaGPT: One Model To Instruction-Follow Them All
Language:Python77659