sunxm2357's Stars
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
SimplifyJobs/New-Grad-Positions
A collection of full time roles in SWE, Quant, and PM for new grads.
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
yaodongC/awesome-instruction-dataset
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
google-research-datasets/wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
LAION-AI/CLIP_benchmark
CLIP-like model evaluation
allenai/unified-io-2
vacancy/SceneGraphParser
A python toolkit for parsing captions (in natural language) into scene graphs (as symbolic representations).
Brave-peng/books
各类闲书分享(equb版本,ipad可直接打开阅读)
ashafaei/pdf2pptx
Convert your (Beamer) PDF slides to (Powerpoint) PPTX
AILab-CVC/SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
prismformore/Multi-Task-Transformer
Code of ICLR2023 paper "TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding" and ECCV2022 paper "Inverted Pyramid Multi-task Transformer for Dense Scene Understanding"
PengtaoJiang/Awesome-Weakly-Supervised-Semantic-Segmentation-Papers
Recent weakly supervised semantic segmentation paper
SALT-NLP/LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
RLHF-V/RLAIF-V
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
vis-nlp/Chart-to-text
Mrhuangyi/Ebooks-Shared
:book: Ebook share
yuecao0119/MMInstruct
The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". The MMInstruct dataset includes 973K instructions from 24 domains and four instruction types.
opendatalab/image-downloader
ROCm/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
sirluk/llm_finetuning
sunxm2357/DIME-FM
Implementation of "DIME-FM: DIstilling Multimodal and Efficient Foundation Models"
Zhongping-Zhang/MGT_Localization
Implementation for Machine-Generated Text Localization (ACL 2024 Findings)
piotr-teterwak/open_clip
An open source implementation of CLIP.