shengyuhao

Ph.D. student at Zhejiang University

Zhejiang UniversityZhejiang

shengyuhao's Stars

microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
Language:Python17.7k 112 4681.7k
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook11.1k 64 259944
lllyasviel/Omost
Your image is almost there!
Language:Python7.2k 45 78418
MasterBin-IIAU/UNINEXT
[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval
Language:Python1.5k 99 56158
colmap/glomap
GLOMAP - Global Structured-from-Motion Revisited
Language:C++1.3k 22 7680
openvla/openvla
OpenVLA: An open-source vision-language-action model for robotic manipulation.
Language:Python1.1k 17 99138
MasterBin-IIAU/Unicorn
[ECCV'22 Oral] Towards Grand Unification of Object Tracking
Language:Python948 20 4587
UMass-Foundation-Model/3D-LLM
Code for 3D-LLM: Injecting the 3D World into Large Language Models
Language:Python915 16 6255
nianticlabs/acezero
[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
Language:Python609 58 2334
microsoft/psi
Platform for Situated Intelligence
Language:C#538 37 17796
NVlabs/EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Language:Python507 31 1643
OpenDriveLab/Vista
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
Language:Python500 19 3633
OpenRobotLab/GRUtopia
GRUtopia: Dream General Robots in a City at Scale
Language:Python472 11 1922
StanfordVL/OmniGibson
OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine. Join our Discord for support: https://discord.gg/bccR5vGFEx
Language:Python462 21 44550
UMass-Foundation-Model/3D-VLA
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
Language:Python323 18 611
facebookresearch/open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
Language:Jupyter Notebook207 9 1019
scene-verse/SceneVerse
Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
Language:Python172 11 222
Chat-3D/Chat-Scene
A multi-modal large language model for 3D scene understanding, excelling in tasks such as 3D grounding, captioning, and question answering.
Language:Python80 8 406
invictus717/MiCo
Explore the Limits of Omni-modal Pretraining at Scale
Language:Python80 2 84
OpenRobotLab/Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
Language:Python78 6 81
clorislili/ManipLLM
The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation(cvpr 2024)
Language:Python72 3 64
ZCMax/ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
Language:Python45 3 31
JudyYe/diffhoi
Language:Jupyter Notebook24 1 30
alanaai/EVUD
Egocentric Video Understanding Dataset (EVUD)
Language:Python19 3 12
eric-ai-lab/MMWorld
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
Language:Python19 2 11
showlab/videogui
official repo of "VideoGUI: A Benchmark for GUI Automation from Instructional Videos"
Language:JavaScript19 3 00
Nathan-Li123/LaMOT
163
BolinLai/LEGO
[ECCV2024, Oral]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning".
Language:Python8 1 00
PKU-ICST-MIPL/FineSports_CVPR2024
7 2 3
taeinkwon/PyHoloAssist
Language:Python3 1 01

shengyuhao

shengyuhao's Stars

microsoft/graphrag

facebookresearch/segment-anything-2

lllyasviel/Omost

MasterBin-IIAU/UNINEXT

colmap/glomap

openvla/openvla

MasterBin-IIAU/Unicorn

UMass-Foundation-Model/3D-LLM

nianticlabs/acezero

microsoft/psi

NVlabs/EAGLE

OpenDriveLab/Vista

OpenRobotLab/GRUtopia

StanfordVL/OmniGibson

UMass-Foundation-Model/3D-VLA

facebookresearch/open-eqa

scene-verse/SceneVerse

Chat-3D/Chat-Scene

invictus717/MiCo

OpenRobotLab/Grounded_3D-LLM

clorislili/ManipLLM

ZCMax/ScanReason

JudyYe/diffhoi

alanaai/EVUD

eric-ai-lab/MMWorld

showlab/videogui

Nathan-Li123/LaMOT

BolinLai/LEGO

PKU-ICST-MIPL/FineSports_CVPR2024

taeinkwon/PyHoloAssist