wyddmw's Stars
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
fudan-generative-vision/hallo
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
timothybrooks/instruct-pix2pix
X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
ActiveVisionLab/Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
zchoi/Awesome-Embodied-Agent-with-LLMs
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
ayaanzhaque/instruct-nerf2nerf
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions (ICCV 2023)
henry123-boy/SpaTracker
[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space
shalfun/DrivingDiffusion
Layout-Guided multi-view driving scene video generation with latent diffusion model
nianticlabs/mickey
[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
maitrix-org/Pandora
Pandora: Towards General World Model with Natural Language Actions and Video States
SkyworkAI/Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
swc-17/SparseDrive
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
autonomousvision/navsim
[NeurIPS 2024] NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
fudan-zvg/PVG
Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering
alfredgu001324/MapUncertaintyPrediction
[CVPR 2024 Award Candidate] Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
wayveai/wayve_scenes
Codebase for the WayveScenes101 Dataset
LostXine/LLaRA
LLaRA: Large Language and Robotics Assistant
zd11024/NaviLLM
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
microsoft/Everything-of-Thoughts-XoT
An implemtation of Everyting of Thoughts (XoT).
autodriving-heart/ECCV-2024-Papers-Autonomous-Driving
ECCV 2024 Paper List about Autonomous Driving
javyduck/ChatScene
[CVPR2024] ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles https://arxiv.org/abs/2405.14062
yangxiaofeng/rectified_flow_prior
Official code for paper: Text-to-Image Rectified Flow as Plug-and-Play Priors
wzcai99/Pixel-Navigator
Official GitHub Repository for Paper "Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill", ICRA 2024
lbaa2022/LLMTaskPlanning
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents (ICLR 2024)
ZCMax/ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
aim-uofa/GeoBench
A toolbox for benchmarking SOTA discriminative and generative geometry estimation models.
itl-ed/llm-dp
LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task