Donghao-Li's Stars
yt-dlp/yt-dlp
A feature-rich command-line audio/video downloader
haoningwu3639/StoryGen
[CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
THUDM/ChatGLM3
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
CrazyBoyM/llama3-Chinese-chat
Llama3、Llama3.1 中文仓库(随书籍撰写中... 各种网友及厂商微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档)
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
VQAssessment/DOVER
[ICCV 2023, Official Code] for paper "Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives". Official Weights and Demos provided.
tgxs002/HPSv2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
zer0int/CLIP-fine-tune
Fine-tuning code for CLIP models
NeoVertex1/SuperPrompt
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
XLabs-AI/x-flux-comfyui
ostris/ai-toolkit
Various AI scripts. Mostly Stable Diffusion stuff.
XLabs-AI/x-flux
hzwer/Awesome-Optical-Flow
This is a list of awesome paper about optical flow and related work.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
tqdm/tqdm
:zap: A Fast, Extensible Progress Bar for Python and CLI
pkuliyi2015/multidiffusion-upscaler-for-automatic1111
Tiled Diffusion and VAE optimize, licensed under CC BY-NC-SA 4.0
instantX-research/InstantID
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
bytedance/1d-tokenizer
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
modelscope/DiffSynth-Studio
Enjoy the magic of Diffusion models!
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
ai-forever/Kandinsky-3
PixArt-alpha/PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
dvlab-research/ControlNeXt
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
Kwai-Kolors/Kolors
Kolors Team