songjin321's Stars
lllyasviel/LuminaBrush
Illumination Drawing Tools for Text-to-Image Diffusion Models
facebookresearch/flow_matching
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
YangLing0818/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
microsoft/TRELLIS
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".
Shakker-Labs/ComfyUI-IPAdapter-Flux
NVlabs/addit
instantX-research/Regional-Prompting-FLUX
Training-free Regional Prompting for Diffusion Transformers 🔥
gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
kornia/kornia
Geometric Computer Vision Library for Spatial AI
bghira/SimpleTuner
A general fine-tuning kit geared toward diffusion models.
facebookresearch/sapiens
High-resolution models for human tasks.
tyxsspa/AnyText
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
FudanVI/benchmarking-chinese-text-recognition
This repository contains datasets and baselines for benchmarking Chinese text recognition.
ostris/ai-toolkit
Various AI scripts. Mostly Stable Diffusion stuff.
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
AIGText/Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
black-forest-labs/flux
Official inference repo for FLUX.1 models
yk7333/d3po
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
Haian-Jin/Neural_Gaffer
[NeurIPS 2024] Official code for "Neural Gaffer: Relighting Any Object via Diffusion"
Chanzhaoyu/chatgpt-web
用 Express 和 Vue3 搭建的 ChatGPT 演示网页
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
lllyasviel/Omost
Your image is almost there!
instantX-research/InstantID
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
ViTAE-Transformer/ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Q-Future/Q-Align
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
ziqihuangg/Awesome-Evaluation-of-Visual-Generation
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
nullquant/ComfyUI-BrushNet
ComfyUI BrushNet nodes
lllyasviel/IC-Light
More relighting!
HVision-NKU/StoryDiffusion
Accepted as [NeurIPS 2024] Spotlight Presentation Paper