michaelyuancb
Now Intern@MoonshotAI; Graduate@IIIS, Tsinghua University; EmbodiedAI & Agent; Simple+Elegant leads to AGI
Tsinghua UniversityBeijing, China
michaelyuancb's Stars
Junyi42/monst3r
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
yatengLG/ISAT_with_segment_anything
Labeling tool with SAM(segment anything model),supports SAM, SAM2, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具
xiongyiheng/ARKit-Scanner
The scanner app acquires RGB-D scans using iPhone LiDAR sensor and ARKit API, stores color, depth and IMU data on local memory and then uploads to PC for processing.
YanjieZe/Paper-List
A paper list of my history reading. Robotics, Learning, Vision.
suyukun666/UFO
Official PyTorch implementation of the “A Unified Transformer Framework for Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection”. (TMM2023)
TEN-framework/TEN-Agent
TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.
michaelyuancb/ego_hoi_model
A model combined 100DoH, Semantic-SAM and EgoHOS for hand-object state classification, detection, segmentation.
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
lyhue1991/torchkeras
Pytorch❤️ Keras 😋😋
tejpshah/interview-pilot-ai
Ace interviews with AI practice. Our agent role-plays personalized interview tailored to your background, listening and replying like a real interviewer. Train across personas for any situation.
jaidevshriram/realmdreamer
Code for RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [Arxiv 2024]
stitionai/devika
Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.
lavague-ai/LaVague
Large Action Model framework to develop AI Web Agents
RoboFlamingo/RoboFlamingo
Code for RoboFlamingo
prejudice666/whu-thesis-latex-template
武汉大学2019级本科毕业论文Latex模板
leptonai/search_with_lepton
Building a quick conversation-based search demo with Lepton AI.
michaelyuancb/general_flow
Repository for "General Flow as Foundation Affordance for Scalable Robot Learning"
GengYiran/GengYiran.github.io
my blog
jonbarron/website
voxposer/voxposer.github.io
ddshan/hand_object_detector
Project and dataset webpage:
idejie/ego_hand_detecor
pretrained_model from Shan et. al . “ Understanding Human Hands in Contact at Internet Scale (CVPR 2020, Oral).”
real-stanford/diffusion_policy
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
ematvey/pybacktest
Vectorized backtesting framework in Python / pandas, designed to make your backtesting easier — compact, simple and fast
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
hassony2/useful-computer-vision-phd-resources
Lists of resources useful for my PhD in computer vision
luca-medeiros/lang-segment-anything
SAM with text prompt
xlang-ai/xlang-paper-reading
Paper collection on building and evaluating language model agents via executable language grounding