iamxiaoyubei's Stars
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
google-research/tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
CASIA-IVA-Lab/FastSAM
Fast Segment Anything
vladmandic/automatic
SD.Next: All-in-one for AI generative image
mnotgod96/AppAgent
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
openai/glide-text2im
GLIDE: a diffusion-based text-conditional image synthesis model
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Tencent/FaceDetection-DSFD
腾讯优图高精度双分支人脸检测器
pharmapsychotic/clip-interrogator
Image to prompt with BLIP and CLIP
Daisy-Zhang/Awesome-Deepfakes-Detection
A list of tools, papers and code related to Deepfake Detection.
DirtyHarryLYL/LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
CircleRadon/Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
penghao-wu/vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
shansongliu/M2UGen
This is the official repository for M2UGen
jbohnslav/opencv_transforms
OpenCV implementation of Torchvision's image augmentations
phellonchen/X-LLM
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
brandontrabucco/da-fusion
Effective Data Augmentation With Diffusion Models
hityzy1122/opencv_transforms_torchvision
opencv reimplement for transforms in torchvision
CVMI-Lab/SyntheticData
Is synthetic data from generative models ready for image recognition?
guozix/TaI-DPT
yossigandelsman/clip_prs
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
Yuheng-Li/PACGen
sunxm2357/DualCoOp
Implementation for "DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations" (NeurIPS 2022))
kodenii/ImaginaryNet
ImaginaryNet: Learning Object Detectors without Real Images and Annotations
Chen94yue/Torchvision.TransformsbyOpencv
Opencv based implementation of Torchvision.Transforms
Tma2333/StableDiffusionProject
Multiple Stable Diffusion Projects.