unolop's Stars
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
heliossun/SQ-LLaVA
Visual self-questioning for large vision-language assistant.
IemProg/CoFiMA
🔥 🔥 [ECCV 2024 Oral ] Official code for "Weighted Ensemble Models Are Strong Continual Learners"
CSAILVision/places365
The Places365-CNNs for Scene Classification
zhoubolei/places_devkit
Development kit for the data of the Places365-Standard and Places365-Challenge
XuJiacong/PIDNet
This is the official repository for our recent work: PIDNet
mcordts/cityscapesScripts
README and scripts for the Cityscapes Dataset
bertjiazheng/awesome-scene-understanding
😎 A list of awesome scene understanding papers.
TUI-NICR/nicr-scene-analysis-datasets
Code to prepare and use common datasets for scene analysis tasks
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
isbhargav/SUN397-TF
Using imagenet pretrained model to classify SUN397 dataset
apple/ml-4m
4M: Massively Multimodal Masked Modeling
ZjjConan/Multi-Modal-Adapter
The official pytorch implemention of our CVPR-2024 paper "MMA: Multi-Modal Adapter for Vision-Language Models".
shikiw/OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
924973292/EDITOR
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Alexander-Yao/Multi-MaP
PyTorch implementation of paper "Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering" (CVPR 2024)
dvlab-research/Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
ProGamerGov/VLM-Captioning-Tools
Python scripts to use for captioning images with VLMs
joeyz0z/MeaCap
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
kijai/ComfyUI-Florence2
Inference Microsoft Florence2 VLM
michelecafagna26/HL-dataset
[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
google/uncertainty-baselines
High-quality implementations of standard and SOTA methods on a variety of tasks.
berkeley-hipie/HIPIE
[NeurIPS2023] Code release for "Hierarchical Open-vocabulary Universal Image Segmentation"
kingthreestones/RefCLIP
jyFengGoGo/InstructDet
Charles-Xie/awesome-described-object-detection
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.
anisha2102/docvqa
Document Visual Question Answering
Jingkang50/OpenOOD
Benchmarking Generalized Out-of-Distribution Detection
yaolinli/CapEnrich
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.