WayneMao

Ph.D student of Waseda University

WayneMao's Stars

openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Language:Jupyter Notebook28.3k 327 4183.5k
deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
Language:Python17.1k 150 1702.2k
faressoft/terminalizer
🦄 Record your terminal and generate animated gif images or share a web player
Language:JavaScript15.6k 122 180508
datawhalechina/self-llm
《开源大模型食用指南》针对**宝宝量身打造的基于Linux环境快速微调（全参数/Lora）、部署国内外开源大模型（LLM）/多模态大模型（MLLM）教程
Language:Jupyter Notebook14.5k 96 2301.7k
Jiayi-Pan/TinyZero
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Language:Python11.5k 128 851.5k
NVIDIA/Cosmos
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Language:Jupyter Notebook7.9k 82 141503
DepthAnything/Depth-Anything-V2
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Language:Python5.2k 43 242466
WangRongsheng/awesome-LLM-resourses
🧑‍🚀 全世界最好的LLM资料总结（数据处理、模型训练、模型部署、o1 模型、MCP、小语言模型、视觉语言模型） | Summary of the world's best LLM resources.
4.7k 65 5484
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
Language:Python4.5k 45 181277
TianxingChen/Embodied-AI-Guide
[Lumina Embodied AI Community] 具身智能入门指南 Embodied-AI-Guide
4.2k 34 5254
Deep-Agent/R1-V
Witness the aha moment of VLM with less than $3.
Language:Python3.5k 47 150275
Physical-Intelligence/openpi
Language:Python2.9k 39 110271
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Language:Python2k 26 52112
ActiveVisionLab/Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
1.6k 54 898
facebookresearch/MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Language:Python1.4k 12 3464
zzli2022/Awesome-System2-Reasoning-LLM
Latest Advances on System-2 Reasoning
Language:Python899 12 837
allenzren/open-pi-zero
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
Language:Python813 13 2748
graspnet/graspnet-baseline
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)
Language:Python617 8 128172
StarCycle/Awesome-Embodied-AI-Job
Lumina Robotics Talent Call | Lumina社区具身智能招贤榜 | A list for Embodied AI / Robotics Jobs (PhD, RA, intern, full-time, etc
569 7 110
Westlake-AGI-Lab/Distill-Any-Depth
The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"
Language:Python520 8 2426
LMM101/Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
412 9 19
jonyzhang2023/awesome-embodied-vla-va-vln
363 12 115
Robot-VLAs/RoboVLMs
Language:Python323 6 2316
moojink/openvla-oft
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Language:Python280 3 3211
qizekun/SoFar
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Language:Python130 2 58
Ucas-HaoranWei/Slow-Perception
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
Language:Python121 2 67
linkangheng/Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
Language:Shell501
mlzoo/BaoZaoAI
基于Qwen-2.5-1.5B 进行DPO fine-tuning后，意外说真话的AI暴躁哥
Language:Jupyter Notebook423
thkkk/manibox
ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation
Language:Python41 2 21
JCZ404/Awesome-Visual-Autoregressive
Curated list of recent visual autoregressive (VAR) modeling works
29 1 00

WayneMao

WayneMao's Stars

openai/CLIP

deepseek-ai/Janus

faressoft/terminalizer

datawhalechina/self-llm

Jiayi-Pan/TinyZero

NVIDIA/Cosmos

DepthAnything/Depth-Anything-V2

WangRongsheng/awesome-LLM-resourses

om-ai-lab/VLM-R1

TianxingChen/Embodied-AI-Guide

Deep-Agent/R1-V

Physical-Intelligence/openpi

facebookresearch/chameleon

ActiveVisionLab/Awesome-LLM-3D

facebookresearch/MetaCLIP

zzli2022/Awesome-System2-Reasoning-LLM

allenzren/open-pi-zero

graspnet/graspnet-baseline

StarCycle/Awesome-Embodied-AI-Job

Westlake-AGI-Lab/Distill-Any-Depth

LMM101/Awesome-Multimodal-Next-Token-Prediction

jonyzhang2023/awesome-embodied-vla-va-vln

Robot-VLAs/RoboVLMs

moojink/openvla-oft

qizekun/SoFar

Ucas-HaoranWei/Slow-Perception

linkangheng/Video-UTR

mlzoo/BaoZaoAI

thkkk/manibox

JCZ404/Awesome-Visual-Autoregressive