xiaojieli0903
Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen).
HIT (Shenzhen)Shenzhen
xiaojieli0903's Stars
CompVis/stable-diffusion
A latent text-to-image diffusion model
Stability-AI/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
lllyasviel/ControlNet
Let us control diffusion models!
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
CompVis/latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
mlfoundations/open_clip
An open source implementation of CLIP.
lucidrains/DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
lucidrains/denoising-diffusion-pytorch
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
lucidrains/imagen-pytorch
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
open-mmlab/mmdetection3d
OpenMMLab's next-generation platform for general 3D object detection.
lucidrains/DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
Luodian/Otter
š¦¦ Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
baaivision/Painter
Painter & SegGPT Series: Vision Foundation Models from BAAI
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
ttengwang/Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
justinpinkney/stable-diffusion
198808xc/Pangu-Weather
An official implementation of Pangu-Weather
chq1155/A-Survey-on-Generative-Diffusion-Model
EdisonLeeeee/Awesome-Masked-Autoencoders
A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).
yxuansu/PandaGPT
[TLLM'23] PandaGPT: One Model To Instruction-Follow Them All
muzairkhattak/multimodal-prompt-learning
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
HighCWu/ControlLoRA
ControlLoRA: A Lightweight Neural Network To Control Stable Diffusion Spatial Information
Lupin1998/Awesome-MIM
[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)
yzd-v/cls_KD
'NKD and USKD' (ICCV 2023) and 'ViTKD' (CVPRW 2024)
X-PLUG/mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
ju-chen/Efficient-Prompt
adobe-research/affordance-insertion
wudongming97/RMOT
[CVPR2023] Referring Multi-Object Tracking