OhGreat's Stars
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
deepfakes/faceswap
Deepfakes Software For All
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
facefusion/facefusion
Industry leading face manipulation platform
serengil/deepface
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Stability-AI/StableStudio
Community interface for generative AI
deep-floyd/IF
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
ai-forever/Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
pharmapsychotic/clip-interrogator
Image to prompt with BLIP and CLIP
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
luca-medeiros/lang-segment-anything
SAM with text prompt
facebookresearch/MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
hila-chefer/Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
glucauze/sd-webui-faceswaplab
Extended faceswap extension for StableDiffusion web-ui with multiple faceswaps, inpainting, checkpoints, ....
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
microsoft/StyleSwin
[CVPR 2022] StyleSwin: Transformer-based GAN for High-resolution Image Generation
sangminwoo/awesome-vision-and-language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
facebookresearch/flip
Official Open Source code for "Scaling Language-Image Pre-training via Masking"
GenImage-Dataset/GenImage
mahmoudnafifi/HistoGAN
Reference code for the paper HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms (CVPR 2021).
muzairkhattak/ViFi-CLIP
[CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".
universome/alis
[ICCV 2021] Aligning Latent and Image Spaces to Connect the Unconnectable
NVIDIA-AI-IOT/clip-distillation
Zero-label image classification via OpenCLIP knowledge distillation
awsaf49/artifact
[ICIP 2023] ArtiFact: A Large-Scale Dataset with Artificial (Fake) and Factual (Real) Images for Generalizable and Robust Synthetic Image Detection
tzktz/face-swap
Face Swap Finetuned Model
JonathanCollu/Slot-Structured-World-Models
Repository containing the code for the paper "Slot Structured World Models".
riccardomajellaro/disentangled-slot-attention
Repository containing the code for the paper "Explicitly Disentangled Representations in Object-Centric Learning".