1349949's Stars
meta-llama/llama3
The official Meta Llama 3 GitHub site
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
DepthAnything/Depth-Anything-V2
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
isaac-sim/IsaacLab
Unified framework for robot learning built on NVIDIA Isaac Sim
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
baaivision/Emu3
Next-Token Prediction is All You Need
openvla/openvla
OpenVLA: An open-source vision-language-action model for robotic manipulation.
openreasoner/openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
opendilab/LightZero
[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
kakaobrain/rq-vae-transformer
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
leggedrobotics/rsl_rl
Fast and simple implementation of RL algorithms, designed to run fully on GPU.
mlfoundations/datacomp
DataComp: In search of the next generation of multimodal datasets
unitreerobotics/unitree_ros
OpenDriveLab/Vista
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
robocasa/robocasa
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
huangwl18/ReKep
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
NVlabs/EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
OpenRobotLab/GRUtopia
GRUtopia: Dream General Robots in a City at Scale
1x-technologies/1xgpt
world modeling challenge for humanoid robots
NVlabs/ProtoMotions
mira-space/MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
LeCAR-Lab/human2humanoid
[IROS 2024] Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation. [CoRL 2024] OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning
tsb0601/MMVP
Beckschen/ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
microsoft/MoCapAct
A Multi-Task Dataset for Simulated Humanoid Control
wang-fujin/PINN4SOH
A physics-informed neural network for battery SOH estimation
TencentARC/ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
vivym/OmniGen