chenqi1126's Stars
comfyanonymous/ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
meta-llama/llama3
The official Meta Llama 3 GitHub site
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
dair-ai/ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
guoyww/AnimateDiff
Official implementation of AnimateDiff.
mlfoundations/open_clip
An open source implementation of CLIP.
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
lllyasviel/Paints-UNDO
Understand Human Behavior to Align True Needs
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
lllyasviel/LayerDiffuse
Transparent Image Layer Diffusion using Latent Transparency
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
TencentARC/MotionCtrl
Official Code for MotionCtrl [SIGGRAPH 2024]
DirtyHarryLYL/LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
SunzeY/AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Bujiazi/MotionClone
Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation
JiuTian-VL/JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
kevin-ssy/CLIP_as_RNN
Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
PangzeCheung/SingDiffusion
[CVPR 2024] Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
vpulab/ovam
Code for the paper Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models @ CVPR 2024
CodeGoat24/DreamText
Official implementation of High Fidelity Scene Text Synthesis.
LinlyAC/VDT-AGPReID
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network (CVPR'24)
ccccwb/Multimodal-Detection-and-Tracking-UAV
A Multimodal Detection and Tracking System based on DJI Payload SDK and Mobile SDK.
WondrousWisdomcard/DiffuseQR
A Progressive Optimization Method for Text-Guided Aesthetic QR Code Generation