Pinned Repositories
Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
computer_use_ootb
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Image2Paragraph
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
MotionDirector
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Show-1
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Show-o
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
ShowUI
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Tune-A-Video
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
X-Adapter
[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Show Lab's Repositories
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
showlab/computer_use_ootb
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
showlab/Show-o
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
showlab/Show-1
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
showlab/ShowUI
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
showlab/Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
showlab/Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
showlab/VideoSwap
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
showlab/Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
showlab/BoxDiff
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
showlab/VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
showlab/ROICtrl
Code for ROICtrl: Boosting Instance Control for Visual Generation
showlab/LOVA3
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
showlab/sparseformer
(ICLR 2024, CVPR 2024) SparseFormer
showlab/MakeAnything
Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
showlab/EvolveDirector
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
showlab/FQGAN
FQGAN: Factorized Visual Tokenization and Generation
showlab/MovieBench
showlab/MovieSeq
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
showlab/Awesome-Robotics-Diffusion
(In progress) A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
showlab/videogui
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
showlab/LayerTracer
Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"
showlab/RingID
showlab/VisInContext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
showlab/Exo2Ego-V
showlab/DiffSim
Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
showlab/Tune-An-Ellipse
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
showlab/GUI-Narrator
Repository of GUI Action Narrator
showlab/IDProtector
The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
showlab/watermark-steganalysis