/open-mllms

open llm for multimodal

Apache License 2.0Apache-2.0

Open LLM for Multimodal

Non-Multimodal LLM: https://github.com/eugeneyan/open-llms

Open Model

Language Model Company/Org Release Date Github/Huggingface Paper/Blog Function Modal Licence
ImageBind FAIR, Meta AI 2023.05 facebookresearch/ImageBind ImageBind: One Embedding Space To Bind Them All
ImageBind: Holistic AI learning across six modalities
cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation image/video, text, audio, depth, IMU, and thermal images CC BY-NC-SA 4.0
BLIP-2 Salesforce 2023.01 blip2
hf/blip-2
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models image-to-text,feature extraction,image-text match image,text MIT
MiniGPT-4 King Abdullah University of Science and Technology 2023.05 Vision-CAIR/MiniGPT-4
hf/Vision-CAIR/MiniGPT-4
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models writing stories and poems inspired by given images, providing solutions to problems shown in images, teaching users how to cook based on food photos, etc. image,text BSD 3-Clause License
LLaVA University of Wisconsin-Madison
Microsoft Research
Columbia University
2023.05 haotian-liu/LLaVA
LLaVA-13b-delta-v0
Visual Instruction Tuning general-purpose visual and language understanding image,text Apache-2.0

Open Data

Name Release Date Function Paper/Blog Dataset Samples License
LLaVA-Instruct-150K 2023.04 IFT Visual Instruction Tuning liuhaotian/LLaVA-Instruct-150K 150K CC BY-SA-4.0
LAION-400M 2021.08 PreTrain LAION-400-MILLION OPEN DATASET laion/laion400m 400M CC BY-SA-4.0
CC3M 2021 PreTrain google-research-datasets/conceptual-captions Google's Conceptual Captions 3M Free
CC12M 2021 PreTrain google-research-datasets/conceptual-12m cc12m 12M Free
SBU 2011 PreTrain Im2Text: Describing Images Using 1 Million Captioned Photographs sbu_captions 1M unkown