vision-language-transformer

There are 16 repositories under vision-language-transformer topic.

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook10k 99 667975
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python6.9k 42 305696
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language:Jupyter Notebook4.8k 34 199646
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Language:C++1.5k 37 185176
henghuiding/ReLA
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Language:Python658 5 2419
shenyunhang/APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Language:Python490 8 6229
henghuiding/Vision-Language-Transformer
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
Language:Python348 4 1723
sdc17/UPop
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
Language:Python99 5 165
haoliuhl/instructrl
Instruction Following Agents with Multimodal Transforemrs
Language:Python52 1 45
sdc17/CrossGET
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
26 6 10
sMamooler/CLIP_Explainability
code for studying OpenAI's CLIP explainability
Language:Jupyter Notebook26 1 05
yiren-jian/BLIText
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Language:Python24 3 51
unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language:Python11 3 00
marialymperaiou/knowledge-enhanced-multimodal-learning
A list of research papers on knowledge-enhanced multimodal learning
7 1 00
aurooj/VLM_SS
Mini-batch selective sampling for knowledge adaption of VLMs for mammography.
Language:Jupyter Notebook1 1 00
atharva-naik/MMML-TermProject-VizWiz-VQA-Challenge
VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)
Language:Python1 0

vision-language-transformer

salesforce/LAVIS

IDEA-Research/GroundingDINO

salesforce/BLIP

AlibabaResearch/AdvancedLiterateMachinery

henghuiding/ReLA

shenyunhang/APE

henghuiding/Vision-Language-Transformer

sdc17/UPop

haoliuhl/instructrl

sdc17/CrossGET

sMamooler/CLIP_Explainability

yiren-jian/BLIText

unitaryai/VTC

marialymperaiou/knowledge-enhanced-multimodal-learning

aurooj/VLM_SS

atharva-naik/MMML-TermProject-VizWiz-VQA-Challenge