visual-grounding
There are 46 repositories under visual-grounding topic.
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
rhett-chen/Robotic-grasping-papers
paper list of robotic grasping and some related works
daveredrum/ScanRefer
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Charles-Xie/awesome-described-object-detection
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.
antoyang/TubeDETR
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
LeapLabTHU/Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
seanzhuh/SeqTR
SeqTR: A Simple yet Universal Network for Visual Grounding
yanmin-wu/EDA
[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
ChenyunWu/PhraseCutDataset
Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"
jianghaojun/Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
yangli18/VLTVG
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
JerryX1110/awesome-rvos
Referring Video Object Segmentation / Multi-Object Tracking Repo
3dlg-hcvc/M3DRef-CLIP
[ICCV 2023] Multi3DRefer: Grounding Text Description to Multiple 3D Objects
TheShadow29/vognet-pytorch
[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
doc-doc/vRGV
Visual Relation Grounding in Videos (ECCV'20, Spotlight)
zlccccc/3DVL_Codebase
[CVPR2022 Oral] 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
zjukg/DUET
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
MultimodalGeo/GeoText-1652
An offical repo for ECCV 2024 Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
svip-lab/LBYLNet
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.
chihyaoma/cyclical-visual-captioning
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision
CurryYuan/ZSVG3D
[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
daveredrum/D3Net
[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
zlccccc/3DVG-Transformer
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
uvavision/SelfEQ
[CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".
CurryYuan/PhraseRefer
Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
xuyang-liu16/VGDiffZero
[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
marialymperaiou/knowledge-enhanced-multimodal-learning
A list of research papers on knowledge-enhanced multimodal learning
1989Ryan/paragon
[ICRA 2023] Differentiable parsing and visual grounding of natural language instructions for object placement
gorjanradevski/text2atlas
Codebase for "Learning to ground medical text in a 3D human atlas (CoNLL 2020)".
CompGuessWhat/comp_probing
Code used to train probing classifiers in the attribute prediction task
JHKim-snu/PGA
[IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
akskuchi/groovist
GROOViST: A Metric for Grounding Objects in Visual Storytelling – EMNLP 2023
bwittmann/TransformerRefer
Utilizing a transformer-based object detector for the task of 3D visual grounding.
ChenBarryHu/TransformerVG
TransformerVG - 3D Visual Grounding with Transformers
scofield7419/MUIE
MUIE: Multimodal Universal Information Extraction
3dlg-hcvc/ENet-ScanNet
Helper tools for extracting and projecting ENet features to ScanNet pointclouds.