Show Lab

Pinned Repositories

Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
580 7 820
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
4k 148 32227
computer_use_ootb
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Language:Python1.3k 15 50121
Image2Paragraph
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
Language:Python801 11 3055
MotionDirector
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Language:Python871 33 4155
Show-1
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Language:Python993 35 2061
Show-o
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Language:Python1.2k 16 4951
ShowUI
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Language:Jupyter Notebook978 15 5256
Tune-A-Video
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Language:Python4.3k 49 97388
X-Adapter
[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Language:Python757 44 3042

Show Lab's Repositories

showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
4k 148 32227
showlab/computer_use_ootb
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Language:Python1.3k 15 50121
showlab/Show-o
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Language:Python1.2k 16 4951
showlab/Show-1
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Language:Python993 35 2061
showlab/ShowUI
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Language:Jupyter Notebook978 15 5256
showlab/Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
580 7 820
showlab/Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
487 18 330
showlab/VideoSwap
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Language:Python379 31 615
showlab/Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
368 20 016
showlab/BoxDiff
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Language:Python258 3 1918
showlab/VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Language:Python103 7 103
showlab/ROICtrl
Code for ROICtrl: Boosting Instance Control for Visual Generation
Language:Python101 1 20
showlab/LOVA3
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
Language:Python73 5 02
showlab/sparseformer
(ICLR 2024, CVPR 2024) SparseFormer
Language:Python70 9 32
showlab/MakeAnything
Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
Language:Python492
showlab/EvolveDirector
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Language:Python46 2 0
showlab/FQGAN
FQGAN: Factorized Visual Tokenization and Generation
Language:Python42 5 00
showlab/MovieBench
Language:Python38 4 01
showlab/MovieSeq
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Language:Jupyter Notebook35 3 21
showlab/Awesome-Robotics-Diffusion
(In progress) A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
30 2 01
showlab/videogui
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Language:JavaScript29 4 01
showlab/LayerTracer
Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"
Language:Python272
showlab/RingID
Language:Python24 1 4
showlab/VisInContext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Language:Python14 2 12
showlab/Exo2Ego-V
11 1 10
showlab/DiffSim
Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Language:Python10 1 10
showlab/Tune-An-Ellipse
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Language:Python9 2 21
showlab/GUI-Narrator
Repository of GUI Action Narrator
Language:JavaScript8 2 00
showlab/IDProtector
The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
7 2 0
showlab/watermark-steganalysis
Language:Python4 1 01