TzRain's Stars
karpathy/LLM101n
LLM101n: Let's build a Storyteller
facebookresearch/dino
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
naver/dust3r
DUSt3R: Geometric 3D Vision Made Easy
amazon-science/mm-cot
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
LLaVA-VL/LLaVA-NeXT
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
52CV/CVPR-2024-Papers
apchenstu/mvsnerf
[ICCV 2021] Our work presents a novel neural rendering approach that can efficiently reconstruct geometric and neural radiance fields for view synthesis.
haonan-li/CMMLU
CMMLU: Measuring massive multitask language understanding in Chinese
lupantech/ScienceQA
Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".
justimyhxu/GRM
Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
google-research-datasets/conceptual-captions
Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
jiawei-ren/dreamgaussian4d
[arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting
microsoft/voxelpose-pytorch
Official implementation of "VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment"
AILab-CVC/SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
yuweihao/MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
OpenGVLab/OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
computational-imaging/GSM
Gaussian Shell Maps for Efficient 3D Human Generation (CVPR 2024)
kai422/IART
[CVPR 2024 Highlight] Enhancing Video Super-Resolution via Implicit Resampling-based Alignment.
open-compass/MMBench
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
PhoenixZ810/MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
opendatalab/laion5b-downloader
TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
OpenGVLab/MMT-Bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
nttmdlab-nlp/SlideVQA
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
Liuziyu77/MMDU
Official repository of MMDU dataset
OpenGVLab/LCL
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
XunshanMan/MVGFormer
This is the official implementation of the work presented at CVPR 2024, titled Multiple View Geometry Transformers for 3D Human Pose Estimation (MVGFormer).