unolop

unolop's Stars

haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python20.7k2.3k
heliossun/SQ-LLaVA
Visual self-questioning for large vision-language assistant.
Language:Python333
IemProg/CoFiMA
🔥 🔥 [ECCV 2024 Oral ] Official code for "Weighted Ensemble Models Are Strong Continual Learners"
Language:Python202
CSAILVision/places365
The Places365-CNNs for Scene Classification
Language:Python1.9k536
zhoubolei/places_devkit
Development kit for the data of the Places365-Standard and Places365-Challenge
Language:Matlab12246
XuJiacong/PIDNet
This is the official repository for our recent work: PIDNet
Language:Python610112
mcordts/cityscapesScripts
README and scripts for the Cityscapes Dataset
Language:Python2.2k608
bertjiazheng/awesome-scene-understanding
😎 A list of awesome scene understanding papers.
73193
TUI-NICR/nicr-scene-analysis-datasets
Code to prepare and use common datasets for scene analysis tasks
Language:Python151
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
3.5k215
isbhargav/SUN397-TF
Using imagenet pretrained model to classify SUN397 dataset
Language:Jupyter Notebook2
apple/ml-4m
4M: Massively Multimodal Masked Modeling
Language:Python1.6k99
ZjjConan/Multi-Modal-Adapter
The official pytorch implemention of our CVPR-2024 paper "MMA: Multi-Modal Adapter for Vision-Language Models".
Language:Python422
shikiw/OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Language:Python29326
924973292/EDITOR
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Language:Python895
Alexander-Yao/Multi-MaP
PyTorch implementation of paper "Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering" (CVPR 2024)
Language:Python91
dvlab-research/Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Language:Python1342
ProGamerGov/VLM-Captioning-Tools
Python scripts to use for captioning images with VLMs
Language:Python35
joeyz0z/MeaCap
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
Language:Python413
kijai/ComfyUI-Florence2
Inference Microsoft Florence2 VLM
Language:Python79756
michelecafagna26/HL-dataset
[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
4
google/uncertainty-baselines
High-quality implementations of standard and SOTA methods on a variety of tasks.
Language:Python1.5k204
berkeley-hipie/HIPIE
[NeurIPS2023] Code release for "Hierarchical Open-vocabulary Universal Image Segmentation"
Language:Jupyter Notebook27320
kingthreestones/RefCLIP
Language:Python341
jyFengGoGo/InstructDet
Language:Python321
Charles-Xie/awesome-described-object-detection
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.
21416
anisha2102/docvqa
Document Visual Question Answering
Language:Python11025
Jingkang50/OpenOOD
Benchmarking Generalized Out-of-Distribution Detection
Language:Python891115
yaolinli/CapEnrich
Language:Python5
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
Language:Python7.1k729