sunxm2357

sunxm2357's Stars

BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
13.1k 256 127835
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python12.9k 107 603901
SimplifyJobs/New-Grad-Positions
A collection of full time roles in SWE, Quant, and PM for new grads.
11.8k 1.4k 3031.1k
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
Language:Python7.2k 66 71556
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python6.5k 59 732578
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Language:Python2.2k 3 227168
deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Language:Python2.1k 20 48201
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Language:Python1.5k 11 243209
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
1.2k 38 658
yaodongC/awesome-instruction-dataset
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
1.1k 16 859
google-research-datasets/wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
1k 38 741
LAION-AI/CLIP_benchmark
CLIP-like model evaluation
Language:Jupyter Notebook631 12 6580
allenai/unified-io-2
Language:Python580 16 1627
vacancy/SceneGraphParser
A python toolkit for parsing captions (in natural language) into scene graphs (as symbolic representations).
Language:Python558 7 1956
Brave-peng/books
各类闲书分享（equb版本，ipad可直接打开阅读）
522 21 1288
ashafaei/pdf2pptx
Convert your (Beamer) PDF slides to (Powerpoint) PPTX
Language:Shell377 15 1088
AILab-CVC/SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
Language:Python322 4 2913
prismformore/Multi-Task-Transformer
Code of ICLR2023 paper "TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding" and ECCV2022 paper "Inverted Pyramid Multi-task Transformer for Dense Scene Understanding"
Language:Python304 7 3223
PengtaoJiang/Awesome-Weakly-Supervised-Semantic-Segmentation-Papers
Recent weakly supervised semantic segmentation paper
281 11 423
SALT-NLP/LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Language:Python259 5 2112
RLHF-V/RLAIF-V
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
Language:Python254 6 3010
vis-nlp/Chart-to-text
Language:OpenEdge ABL106 2 1332
Mrhuangyi/Ebooks-Shared
:book: Ebook share
Language:C++51 2 019
yuecao0119/MMInstruct
The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". The MMInstruct dataset includes 973K instructions from 24 domains and four instruction types.
Language:Python36 5 22
opendatalab/image-downloader
Language:Python27 4 23
ROCm/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python22 5 96
sirluk/llm_finetuning
Language:Jupyter Notebook12 2 04
sunxm2357/DIME-FM
Implementation of "DIME-FM: DIstilling Multimodal and Efficient Foundation Models"
Language:Python12 1 40
Zhongping-Zhang/MGT_Localization
Implementation for Machine-Generated Text Localization (ACL 2024 Findings)
Language:Python10 3 00
piotr-teterwak/open_clip
An open source implementation of CLIP.
Language:Jupyter Notebook1 0 01