multimodal-llm

There are 12 repositories under multimodal-llm topic.

  • eric-ai-lab/MiniGPT-5

    Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

    Language:Python850124352
  • alipay/Ant-Multi-Modal-Framework

    Research Code for Multimodal-Cognition Team in Ant Group

    Language:Python1144195
  • Zhoues/MineDreamer

    This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "

    Language:Python68424
  • UCSC-VLAA/vllm-safety-benchmark

    [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"

    Language:Python63412
  • shanface33/GPT4MF_UB

    Official repository of the paper: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

  • HenryPengZou/ImplicitAVE

    [ACL 2024 Findings] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"

    Language:Jupyter Notebook9200
  • zhudotexe/kani-vision

    Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

    Language:Python7200
  • iamaziz/chat_with_images

    Streamlit app to chat with images using Multi-modal LLMs.

    Language:Python6204
  • autodistill/autodistill-llava

    LLaVA base model for use with Autodistill.

    Language:Python5582
  • aastroza/cachai

    The future of AI is speaking Chilean, cachai?

    Language:Jupyter Notebook3180
  • abdur75648/MedicalGPT

    Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)

    Language:Python1100
  • ChocoWu/SeTok-web

    This is the project webpage for 'SeTok'.

    Language:CSS10