visual-language-learning

There are 14 repositories under visual-language-learning topic.

haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python23.6k 160 1.6k2.6k
NExT-GPT/NExT-GPT
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Language:Python3.6k 60 116360
EvolvingLMMs-Lab/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Language:Python3.3k 80 165208
InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Language:Python2.9k 43 436177
xiaoachen98/Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
Language:Python419 11 3122
RLHF-V/RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Language:Python292 2 298
mlpc-ucsd/BLIVA
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Language:Python261 8 2828
thomas-yanxin/KarmaVLM
🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.
Language:Python88 1 13
AdrianBZG/llama-multimodal-vqa
Multimodal Instruction Tuning for Llama 3
Language:Python50 1 411
xinyanghuang7/Basic-Visual-Language-Model
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
Language:Python46 3 28
Skyline-9/Shotluck-Holmes
[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding
Language:Python13 2 00
ashleykleynhans/llava-docker
Docker image for LLaVA: Large Language and Vision Assistant
Language:Shell3 1 1017
MuhammadAliS/CLIP
PyTorch implementation of OpenAI's CLIP model for image classification, visual search, and visual question answering (VQA).
Language:Jupyter Notebook2 1 00
ecoxial2007/EffVideoQA
Efficient Video Question Answering
Language:Python1 2 00

visual-language-learning

haotian-liu/LLaVA

NExT-GPT/NExT-GPT

EvolvingLMMs-Lab/Otter

InternLM/InternLM-XComposer

xiaoachen98/Open-LLaVA-NeXT

RLHF-V/RLHF-V

mlpc-ucsd/BLIVA

thomas-yanxin/KarmaVLM

AdrianBZG/llama-multimodal-vqa

xinyanghuang7/Basic-Visual-Language-Model

Skyline-9/Shotluck-Holmes

ashleykleynhans/llava-docker

MuhammadAliS/CLIP

ecoxial2007/EffVideoQA