Pinned Repositories
awesome-RSVLM
Collection of Remote Sensing Vision-Language Models
GroundVLP
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
OmAgent
Build multimodal language agents for fast prototype and production
OmDet
Real-time and accurate open-vocabulary end-to-end object detection
OmModel
A collection of strong multimodal models for building multimodal AGI agents
OVDEval
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
RS5M
RS5M: a large-scale vision language dataset for remote sensing [TGRS]
VL-CheckList
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
VLM-R1
Solve Visual Understanding with Reinforced VLMs
ZoomEye
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Om AI Lab's Repositories
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
om-ai-lab/OmAgent
Build multimodal language agents for fast prototype and production
om-ai-lab/OmDet
Real-time and accurate open-vocabulary end-to-end object detection
om-ai-lab/RS5M
RS5M: a large-scale vision language dataset for remote sensing [TGRS]
om-ai-lab/awesome-RSVLM
Collection of Remote Sensing Vision-Language Models
om-ai-lab/VL-CheckList
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
om-ai-lab/GroundVLP
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
om-ai-lab/OVDEval
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
om-ai-lab/ZoomEye
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
om-ai-lab/OmModel
A collection of strong multimodal models for building multimodal AGI agents
om-ai-lab/open-agent-leaderboard
Reproducible Language Agent Research
om-ai-lab/ImageRAG
Enhancing Ultrahigh Resolution Remote Sensing Imagery Analysis With ImageRAG [GRSM]
om-ai-lab/OmChat
A suite of multimodal language models that are powerful and efficient
om-ai-lab/OmAgentDocs
om-ai-lab/habitat-lab
A modular high-level library to train embodied AI agents across a variety of tasks, environments, and simulators.
om-ai-lab/VLM-R1.github.io
Blog Site for VLM-R1
om-ai-lab/bottom-up-attention.pytorch
An PyTorch reimplementation of bottom-up-attention models
om-ai-lab/om-ai-lab.github.io
Official website for the org