vision-language-learning

There are 10 repositories under vision-language-learning topic.

  • AIDC-AI/Ovis

    A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

    Language:Python33732714
  • shikiw/OPERA

    [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

    Language:Python26324524
  • RLHF-V/RLAIF-V

    RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

    Language:Python2064276
  • YunzeMan/Situation3D

    [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

    Language:Python17331
  • LooperXX/ManagerTower

    Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

    Language:Python9201
  • SHTUPLUS/GITM-MR

    The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".

    Language:Python6300
  • yubin1219/CrossVLT

    Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation (Published in IEEE TMM 2023)

    Language:Python5200
  • lyuchenyang/Dialogue-to-Video-Retrieval

    Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"

    Language:Python3221
  • abhinav-neil/socratic-models

    Socratic models for multimodal reasoning & image captioning

    Language:Jupyter Notebook1100
  • Ravi-Teja-konda/TunedLlavaDelights

    Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

    Language:Python1200