jonyzhang2023's Stars
deepseek-ai/DeepSeek-V3
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
NVIDIA/Cosmos
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate videos.
FoundationVision/VAR
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
NVlabs/VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
atong01/conditional-flow-matching
TorchCFM: a Conditional Flow Matching library
OpenDriveLab/AgiBot-World
World's First Large-scale High-quality Robotic Manipulation Benchmark
zchoi/Awesome-Embodied-Agent-with-LLMs
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
orangeduck/Motion-Matching
Learned Motion Matching example implementation and source code for the article "Code vs Data Driven Displacement"
facebookresearch/metamotivo
The first behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.
alexsax/2D-3D-Semantics
The data skeleton from Joint 2D-3D-Semantic Data for Indoor Scene Understanding
allenzren/open-pi-zero
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
UMass-Foundation-Model/3D-VLA
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
geng-haoran/Simulately
A universal summary of current robotics simulators
Yzichen/FlashOCC
vision-x-nyu/thinking-in-space
Official repo and evaluation implementation of VSI-Bench
RL4VLM/RL4VLM
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
xuxw98/ESAM
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Robot-VLAs/RoboVLMs
mbodiai/embodied-agents
Seamlessly integrate state-of-the-art transformer models into robotics stacks
mlzxy/arp
Autoregressive Policy for Robot Learning
HRI-EU/flow_matching
Affordance-based Robot Manipulation with Flow Matching
jamycheung/360BEV
Repository of 360BEV
LiewFeng/RayDN
[ECCV 2024] Ray Denoising (RayDN): Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
Stanford-ILIAD/openvla-mini
OpenVLA: An open-source vision-language-action model for robotic manipulation.
facebookresearch/humenv
HumEnv is an SMPL humanoid environment enabling systematic model comparison and reproducibility
ir-lab/bimanual-imitation
Code for paper, "A Comparison of Imitation Learning Algorithms for Bimanual Manipulation" (Drolet et al., 2024)
TEA-Lab/Robo-ABC
[ECCV 2024] 🎉 Official repository of "Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation"
yuxuanxienova/UnityTerrainConvertor
MyRepositories-hub/DML-RL