multimodal-llm

There are 12 repositories under multimodal-llm topic.

eric-ai-lab/MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
Language:Python850 12 4352
alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
Language:Python114 4 195
Zhoues/MineDreamer
This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "
Language:Python68 4 24
UCSC-VLAA/vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
Language:Python63 4 12
shanface33/GPT4MF_UB
Official repository of the paper: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics
10 1 00
HenryPengZou/ImplicitAVE
[ACL 2024 Findings] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"
Language:Jupyter Notebook9 2 00
zhudotexe/kani-vision
Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.
Language:Python7 2 00
iamaziz/chat_with_images
Streamlit app to chat with images using Multi-modal LLMs.
Language:Python6 2 04
autodistill/autodistill-llava
LLaVA base model for use with Autodistill.
Language:Python5 5 82
aastroza/cachai
The future of AI is speaking Chilean, cachai?
Language:Jupyter Notebook3 1 80
abdur75648/MedicalGPT
Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)
Language:Python1 1 00
ChocoWu/SeTok-web
This is the project webpage for 'SeTok'.
Language:CSS1 0

multimodal-llm

eric-ai-lab/MiniGPT-5

alipay/Ant-Multi-Modal-Framework

Zhoues/MineDreamer

UCSC-VLAA/vllm-safety-benchmark

shanface33/GPT4MF_UB

HenryPengZou/ImplicitAVE

zhudotexe/kani-vision

iamaziz/chat_with_images

autodistill/autodistill-llava

aastroza/cachai

abdur75648/MedicalGPT

ChocoWu/SeTok-web