shengyuhao's Stars
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
lllyasviel/Omost
Your image is almost there!
MasterBin-IIAU/UNINEXT
[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval
colmap/glomap
GLOMAP - Global Structured-from-Motion Revisited
openvla/openvla
OpenVLA: An open-source vision-language-action model for robotic manipulation.
MasterBin-IIAU/Unicorn
[ECCV'22 Oral] Towards Grand Unification of Object Tracking
UMass-Foundation-Model/3D-LLM
Code for 3D-LLM: Injecting the 3D World into Large Language Models
nianticlabs/acezero
[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
microsoft/psi
Platform for Situated Intelligence
NVlabs/EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
OpenDriveLab/Vista
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
OpenRobotLab/GRUtopia
GRUtopia: Dream General Robots in a City at Scale
StanfordVL/OmniGibson
OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine. Join our Discord for support: https://discord.gg/bccR5vGFEx
UMass-Foundation-Model/3D-VLA
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
facebookresearch/open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
scene-verse/SceneVerse
Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
Chat-3D/Chat-Scene
A multi-modal large language model for 3D scene understanding, excelling in tasks such as 3D grounding, captioning, and question answering.
invictus717/MiCo
Explore the Limits of Omni-modal Pretraining at Scale
OpenRobotLab/Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
clorislili/ManipLLM
The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation(cvpr 2024)
ZCMax/ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
JudyYe/diffhoi
alanaai/EVUD
Egocentric Video Understanding Dataset (EVUD)
eric-ai-lab/MMWorld
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
showlab/videogui
official repo of "VideoGUI: A Benchmark for GUI Automation from Instructional Videos"
Nathan-Li123/LaMOT
BolinLai/LEGO
[ECCV2024, Oral]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning".
PKU-ICST-MIPL/FineSports_CVPR2024
taeinkwon/PyHoloAssist