alice-cool's Stars
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
jianzongwu/Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
liliu-avril/Awesome-Segment-Anything
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
TencentARC/SEED-Story
SEED-Story: Multimodal Long Story Generation with Large Language Model
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
sstary/SSRS
zhengli97/Awesome-Prompt-Adapter-Learning-for-VLMs
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
imagegridworth/IG-VLM
zhiweihu1103/AgriMa
后稷-首个开源中文农业大模型
Hzzone/PseCo
(CVPR 2024) Point, Segment and Count: A Generalized Framework for Object Counting
Junjue-Wang/EarthVQA
[AAAI 2024] EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
WHB139426/Grounded-Video-LLM
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
amazon-science/QA-ViT
BioMedIA-MBZUAI/MedPromptX
StephenApX/UCD-SCM
[IGARSS 2024] Segment Change Model (SCM) for Unsupervised Change detection in VHR Remote Sensing Images: a Case Study of Buildings
jinlHe/PeFoMed
The code for paper: PeFoM-Med: Parameter Efficient Fine-tuning on Multi-modal Large Language Models for Medical Visual Question Answering
JiajiaLi04/Agriculture-Foundation-Models
Foundation models & LLMs
wchh-2000/SAMPolyBuild
Adapting the Segment Anything Model for Polygonal Building Extraction
rabiulcste/vqazero
visual question answering prompting recipes for large vision-language models
yzygit1230/SCD-SAM
Lans1ng/PointSAM
[TGRS2025] Code for "PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images"
matthewdm0816/BridgeQA
[AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA
GaryGuTC/LaPA_model
[CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
codezakh/SelTDA
[CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
StriveZs/ALPS
ALPS: An Auto-Labeling and Pre-training Scheme for Remote Sensing Segmentation With Segment Anything Model
bowen-upenn/Multi-Agent-VQA
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
ControlNet/HYDRA
[ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Lackel/DKA
[Arxiv 2024] Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models
thomaswei-cn/MC-CoT
MC-CoT implementation code
WHB139426/QA-Prompts
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge [ECCV'24]