ModestYjx's Stars
CompVis/stable-diffusion
A latent text-to-image diffusion model
Stability-AI/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
lllyasviel/ControlNet
Let us control diffusion models!
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
google/mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
modelscope/facechain
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
facebookresearch/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
kohya-ss/sd-scripts
Akegarasu/lora-scripts
SD-Trainer. LoRA & Dreambooth training scripts & GUI use kohya-ss's trainer, for diffusion model.
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
facebookresearch/Mask2Former
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
baaivision/Painter
Painter & SegGPT Series: Vision Foundation Models from BAAI
AIGCDesignGroup/ReplaceAnything
IceClear/StableSR
[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
dreamoving/dreamoving-project
Official implementation of DreaMoving
ttengwang/Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
Sense-X/Co-DETR
[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
shenyunhang/APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
e4s2022/e4s
(CVPR 2023) E4S: Fine-grained Face Swapping via Regional GAN Inversion
UX-Decoder/LLaVA-Grounding
sail-sg/CLoT
CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".
VPGTrans/VPGTrans
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
facebookresearch/DCI
Densely Captioned Images (DCI) dataset repository.
bytedance/FreeSeg
shan-mx/Video-CLIP-Indexer