WayneMao's Stars
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
faressoft/terminalizer
🦄 Record your terminal and generate animated gif images or share a web player
datawhalechina/self-llm
《开源大模型食用指南》针对**宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Jiayi-Pan/TinyZero
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
NVIDIA/Cosmos
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
DepthAnything/Depth-Anything-V2
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
WangRongsheng/awesome-LLM-resourses
🧑🚀 全世界最好的LLM资料总结(数据处理、模型训练、模型部署、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
TianxingChen/Embodied-AI-Guide
[Lumina Embodied AI Community] 具身智能入门指南 Embodied-AI-Guide
Deep-Agent/R1-V
Witness the aha moment of VLM with less than $3.
Physical-Intelligence/openpi
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
ActiveVisionLab/Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
facebookresearch/MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
zzli2022/Awesome-System2-Reasoning-LLM
Latest Advances on System-2 Reasoning
allenzren/open-pi-zero
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
graspnet/graspnet-baseline
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)
StarCycle/Awesome-Embodied-AI-Job
Lumina Robotics Talent Call | Lumina社区具身智能招贤榜 | A list for Embodied AI / Robotics Jobs (PhD, RA, intern, full-time, etc
Westlake-AGI-Lab/Distill-Any-Depth
The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"
LMM101/Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
jonyzhang2023/awesome-embodied-vla-va-vln
Robot-VLAs/RoboVLMs
moojink/openvla-oft
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
qizekun/SoFar
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Ucas-HaoranWei/Slow-Perception
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
linkangheng/Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
mlzoo/BaoZaoAI
基于Qwen-2.5-1.5B 进行DPO fine-tuning后,意外说真话的AI暴躁哥
thkkk/manibox
ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation
JCZ404/Awesome-Visual-Autoregressive
Curated list of recent visual autoregressive (VAR) modeling works