MASKlll

MASKlll's Stars

real-stanford/scalingup
[CoRL 2023] This repository contains data generation and training code for Scaling Up & Distilling Down
Language:Python34021
clorislili/ManipLLM
The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation(cvpr 2024)
Language:Python724
GeWu-Lab/DepthHelps-IROS2024
Language:Python5
UMass-Foundation-Model/3D-VLA
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
Language:Python32211
bytedance/GR-MG
Official implementation of GR-MG
Language:Python272
hkchengrex/Cutie
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
Language:Python69169
huangwl18/ReKep
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
Language:Python41640
openvla/openvla
OpenVLA: An open-source vision-language-action model for robotic manipulation.
Language:Python1.1k139
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Language:Python2.5k138
cfeng16/UniTouch
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
22
mkt1412/GraspGPT_public
code implementation of GraspGPT and FoundationGrasp
Language:Python7711
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
Language:Python89467
YvanYin/Metric3D
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
Language:Python1.3k100
isl-org/ZoeDepth
Metric depth estimation from a single image
Language:Jupyter Notebook2.3k212
remyxai/VQASynth
Compose multimodal datasets 🎹
Language:Python1828
epic-kitchens/epic-kitchens-100-annotations
:plate_with_cutlery: Annotations for the public release of the EPIC-KITCHENS-100 dataset
Language:Python12828
bdaiinstitute/theia
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
Language:Python1446
BAAI-DCAI/SpatialBot
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
Language:Python1349
graspnet/anygrasp_sdk
Language:Python31627
rail-berkeley/fmb
Language:Python33
HCPLab-SYSU/Embodied_AI_Paper_List
[Embodied-AI-Survey-2024] Paper list and projects for Embodied AI
56838
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Language:Python19.9k3k
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Language:Jupyter Notebook25k3.2k
facebookresearch/Ego4d
Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
Language:Jupyter Notebook34847
vimalabs/VIMABench
Official Task Suite Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
Language:Python27434
intuitive-robots/mdt_policy
[RSS 2024] Code for "Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals" for CALVIN experiments with pre-trained weights
Language:C++545
DepthAnything/Depth-Anything-V2
Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Language:Python3.4k283
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
Language:Python31.7k4.7k
LostXine/LLaRA
LLaRA: Large Language and Robotics Assistant
Language:Python1433
changhaonan/A3VLM
[CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`
Language:Python673