Non-Multimodal LLM: https://github.com/eugeneyan/open-llms
Language Model | Company/Org | Release Date | Github/Huggingface | Paper/Blog | Function | Modal | Licence |
---|---|---|---|---|---|---|---|
ImageBind | FAIR, Meta AI | 2023.05 | facebookresearch/ImageBind | ImageBind: One Embedding Space To Bind Them All ImageBind: Holistic AI learning across six modalities |
cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation | image/video, text, audio, depth, IMU, and thermal images | CC BY-NC-SA 4.0 |
BLIP-2 | Salesforce | 2023.01 | blip2 hf/blip-2 |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | image-to-text,feature extraction,image-text match | image,text | MIT |
MiniGPT-4 | King Abdullah University of Science and Technology | 2023.05 | Vision-CAIR/MiniGPT-4 hf/Vision-CAIR/MiniGPT-4 |
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | writing stories and poems inspired by given images, providing solutions to problems shown in images, teaching users how to cook based on food photos, etc. | image,text | BSD 3-Clause License |
LLaVA | University of Wisconsin-Madison Microsoft Research Columbia University |
2023.05 | haotian-liu/LLaVA LLaVA-13b-delta-v0 |
Visual Instruction Tuning | general-purpose visual and language understanding | image,text | Apache-2.0 |
Name | Release Date | Function | Paper/Blog | Dataset | Samples | License |
---|---|---|---|---|---|---|
LLaVA-Instruct-150K | 2023.04 | IFT | Visual Instruction Tuning | liuhaotian/LLaVA-Instruct-150K | 150K | CC BY-SA-4.0 |
LAION-400M | 2021.08 | PreTrain | LAION-400-MILLION OPEN DATASET | laion/laion400m | 400M | CC BY-SA-4.0 |
CC3M | 2021 | PreTrain | google-research-datasets/conceptual-captions | Google's Conceptual Captions | 3M | Free |
CC12M | 2021 | PreTrain | google-research-datasets/conceptual-12m | cc12m | 12M | Free |
SBU | 2011 | PreTrain | Im2Text: Describing Images Using 1 Million Captioned Photographs | sbu_captions | 1M | unkown |