Arking1995's Stars
voxel51/fiftyone
The open-source tool for building high-quality datasets and computer vision models
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
open-mmlab/mmpose
OpenMMLab Pose Estimation Toolbox and Benchmark.
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
VAST-AI-Research/TripoSR
nerfies/nerfies.github.io
google-research/kubric
A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
lllyasviel/LayerDiffuse
Transparent Image Layer Diffusion using Latent Transparency
Maks-s/sd-akashic
A compendium of informations regarding Stable Diffusion (SD)
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
allenai/objaverse-xl
🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!
facebookresearch/omni3d
Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"
hzxie/CityDreamer
The official implementation of "CityDreamer: Compositional Generative Model of Unbounded 3D Cities". (Xie et al., CVPR 2024)
google-research-datasets/conceptual-captions
Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
CSAILVision/ADE20K
ADE20K Dataset
OSU-NLP-Group/MagicBrush
[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".
bcmi/libcom
Image composition toolbox: everything you want to know about image composition or object insertion
mertyg/vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Karine-Huang/T2I-CompBench
[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
CVMI-Lab/SyntheticData
Is synthetic data from generative models ready for image recognition?
facebookresearch/unibench
Python Library to evaluate VLM models' robustness across diverse benchmarks
HaozheZhao/UltraEdit
happy-fish-01/National_interest_waiver_waittime
USCIS Employment-based-2 national interest waiver wait time
ChenyanWu/MEBOW
Code for "MEBOW: Monocular Estimation of Body Orientation In the Wild", CVPR 2020
wufeim/DST3D
Official implementation of "Generating images with 3D annotations using diffusion models".
arielnlee/LLaVA-1.6-ft
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Lizw14/Super-CLEVR
Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"
allenai/object-edit
wufeim/imagenet3d
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding