Pinned Repositories
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
MetaTransformer
Meta-Transformer for Unified Multimodal Learning
ChatBridge
ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.
DARNet
Multi-Agents
student_mis
testGit
试试git
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
MMSA-FET
A Tool for extracting multimodal features from videos.