Pinned Repositories
Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, and various other applications.
computer_use_ootb
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Image2Paragraph
[Image 2 Text Para] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
MotionDirector
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Show-1
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Show-o
[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
ShowUI
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Tune-A-Video
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Show Lab's Repositories
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, and various other applications.
showlab/Show-o
[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
showlab/computer_use_ootb
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
showlab/ShowUI
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
showlab/Show-1
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
showlab/Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
showlab/Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
showlab/Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
showlab/OmniConsistency
The official code implementation of the paper "OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data."
showlab/livecc
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
showlab/Awesome-Robotics-Diffusion
A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
showlab/D-AR
the official repo for "D-AR: Diffusion via Autoregressive Models"
showlab/WorldGUI
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
showlab/Impossible-Videos
ICML 2025 - Impossible Videos
showlab/SMS
[ICCV 2025] Balanced Image Stylization with Style Matching Score
showlab/Exo2Ego-V
showlab/Multi-human-Talking-Video-Dataset
Official repository for Muti-human Interactive Talking Dataset
showlab/VideoGUI
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
showlab/UniRL
The code repository of UniRL
showlab/DoraCycle
[CVPR 2025] DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
showlab/SAM-I2V
[CVPR 2025] SAM-I2V
showlab/Q2A
[ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
showlab/DiffSim
[ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
showlab/IDProtector
The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
showlab/DIM
The official implementation of the paper "Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination"
showlab/macosworld
showlab/TrustScorer
ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score
showlab/omg
Open Multimodal Gathering workshop @ NUS
showlab/WMAdapter
A watermark plugin for latent diffusion models.