yakunpku's Stars
wangjiangshan0725/RF-Solver-Edit
Taming FLUX for Image Inversion & Editing; OpenSora for Video Inversion & Editing! (Official implementation for Taming Rectified Flow for Inversion and Editing.)
DmitryUlyanov/deep-image-prior
Image restoration with neural networks but without learning.
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
HelloVision/ComfyUI_HelloMeme
Official comfyui repository of Hellomeme
HelloVision/HelloMeme
The official HelloMeme GitHub site
TIGER-AI-Lab/AnyV2V
Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" (TMLR 2024)
instantX-research/Regional-Prompting-FLUX
Training-free Regional Prompting for Diffusion Transformers 🔥
yisol/IDM-VTON
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
kleinlee/MiniMates
The fastest digital human algorithm, now on your desktop.
alimama-creative/FLUX-Controlnet-Inpainting
facebookresearch/sapiens
High-resolution models for human tasks.
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
TachibanaYoshino/AnimeGAN
A Tensorflow implementation of AnimeGAN for fast photo animation ! This is the Open source of the paper 「AnimeGAN: a novel lightweight GAN for photo animation」, which uses the GAN framwork to transform real-world photos into anime images.
kyutai-labs/moshi
VectorSpaceLab/OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
serengil/deepface
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
gpt-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
OpenGVLab/MUTR
[AAAI 2024] Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation
sihyun-yu/REPA
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
conradry/copy-paste-aug
Copy-paste augmentation for segmentation and detection tasks
OpenDriveLab/Vista
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
zchuz/CoT-Reasoning-Survey
[ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Atten4Vis/CAE
This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"
MingXiangL/DEVIL
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective[NeurIPS 2024].
HyperGAI/HPT
HPT - Open Multimodal LLMs from HyperGAI
xinntao/ESRGAN
ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.
xinntao/facexlib
FaceXlib aims at providing ready-to-use face-related functions based on current STOA open-source methods.