jasper0314-huang's Stars
zhihou7/dit_policy_vla
Psi-Robot/DexGraspVLA
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
juruobenruo/DexVLA
hiyouga/EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
b09902097/motionmatcher
The implementation of MotionMatcher, a feature-level fine-tuning framework for motion customization.
EmbodiedBench/EmbodiedBench
Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.
OpenMOSS/VLABench
Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.
microsoft/CogACT
A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
2U1/Qwen2-VL-Finetune
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
QwenLM/Qwen2.5-VL
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Robot-VLAs/RoboVLMs
genforce/freecontrol
Official implementation of CVPR 2024 paper: "FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition"
jasper0314-huang/Receler
[ECCV 2024] "Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers" (Official Implementation)
voxel51/fiftyone-brain
Open source AI/ML capabilities for the FiftyOne ecosystem
voxel51/fiftyone
Refine high-quality datasets and visual AI models
AFeng-x/PixWizard
[ICLR2025]
LPengYang/MotionClone
[ICLR 2025] Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
MiuLab/VisualDialog
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Jack24658735/FedLGT
[AAAI 2024] Official Implementation of Language-Guided Transformer for Federated Multi-Label Classification
TimChou-ntu/GSNeRF
[CVPR 2024] GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
jjihwan/FIFO-Diffusion_public
Official implementation of FIFO-Diffusion: Generating Infinite Videos from Text without Training (NeurIPS 2024)
chu0802/SnD
This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24
showlab/MotionDirector
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
NUS-HPC-AI-Lab/VideoSys
VideoSys: An easy and efficient system for video generation
ntucllab/libcll
Complementary-label learning in Pytorch
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
diffusion-motion-transfer/diffusion-motion-transfer
Official Pytorch Implementation for "Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer""
jianzongwu/MotionBooth
[NeurIPS 2024 Spotlight] The official implement of research paper "MotionBooth: Motion-Aware Customized Text-to-Video Generation"
agwmon/MuDI
MuDI: Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models (NeurIPS 2024)