ponimatkin's Stars
Delgan/loguru
Python logging made (stupidly) simple
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
google-deepmind/mujoco
Multi-Joint dynamics with Contact. A general purpose physics simulator.
sczhou/ProPainter
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
naver/dust3r
DUSt3R: Geometric 3D Vision Made Easy
facebookresearch/co-tracker
CoTracker is a model for tracking any point (pixel) on a video.
ali-vilab/VGen
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
Doubiiu/DynamiCrafter
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
apple/axlearn
An Extensible Deep Learning Library
real-stanford/diffusion_policy
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
openvla/openvla
OpenVLA: An open-source vision-language-action model for robotic manipulation.
google-deepmind/mujoco_menagerie
A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
criteo/autofaiss
Automatically create Faiss knn indices with the most optimal similarity search parameters.
hkchengrex/Cutie
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
jonbarron/camp_zipnerf
OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
RaymondWang987/NVDS
ICCV 2023 "Neural Video Depth Stabilizer" (NVDS) & TPAMI 2024 "NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation" (NVDS+)
geopavlakos/hamer
HaMeR: Reconstructing Hands in 3D with Transformers
TencentARC/UMT
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.
PKU-EPIC/GAPartNet
[CVPR 2023 Highlight] GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts.
shikharbahl/vrb
schmidtdominik/LAPO
Code for the ICLR 2024 spotlight paper: "Learning to Act without Actions" (introducing Latent Action Policies)
yxKryptonite/RAM_code
Official implementation of RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation
JeanElsner/panda_mujoco
MuJoCo model of the Franka Emika Robot System
cvlab-columbia/dreamitate
Dreamitate: Real-World Visuomotor Policy Learning via Video Generation (CoRL 2024)
JeanElsner/dm_robotics_panda
Panda model for dm_robotics
andvg3/LGD
Dataset and Code for CVPR 2024 paper "Language-driven Grasp Detection."