visual-language-models
There are 16 repositories under visual-language-models topic.
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
camel-ai/crab
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
bilel-bj/ROSGPT_Vision
Commanding robots using only Language Models' prompts
hk-zh/language-conditioned-robot-manipulation-models
https://arxiv.org/abs/2312.10807
AlignGPT-VL/AlignGPT
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
tianyu-z/VCR
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
jaisidhsingh/CoN-CLIP
Implementation of the "Learn No to Say Yes Better" paper.
Sid2697/HOI-Ref
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
xinyanghuang7/Basic-Visual-Language-Model
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
amathislab/wildclip
Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models
sduzpf/UAP_VLP
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
csebuetnlp/IllusionVQA
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
declare-lab/Sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
GraphPKU/CoI
Chain of Images for Intuitively Reasoning
CristianoPatricio/concept-based-interpretability-VLM
Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024 (Oral).
laclouis5/uform-coreml-converters
CLI for converting UForm models to CoreML.