Pinned Repositories
bark
🔊 Text-Prompted Generative Audio Model
cog-RMBG
Fork of https://huggingface.co/briaai/RMBG-1.4
cog-sd-txt2imghd
Stable-diffusion with Real-ESRGAN for upsampling
cog-themed-diffusion
cog-whisper
insanely-fast-whisper
Incredibly fast Whisper-large-v3
Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
rudalle-sr
A Cog implementation of the Real-ESRGAN super-resolution model from ruDALL-E.
SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild
video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
chenxwh's Repositories
chenxwh/insanely-fast-whisper
Incredibly fast Whisper-large-v3
chenxwh/SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild
chenxwh/Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
chenxwh/video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
chenxwh/SadTalker
(CVPR 2023)SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
chenxwh/OpenVoice
Instant voice cloning by MyShell.
chenxwh/Sana
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
chenxwh/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
chenxwh/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
chenxwh/Omost
Your image is almost there!
chenxwh/Lotus
Official Implementation of Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
chenxwh/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
chenxwh/DeepSeek-VL2
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
chenxwh/PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
chenxwh/CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
chenxwh/ml-depth-pro
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
chenxwh/chenxwh.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics
chenxwh/DiffSynth-Studio
Enjoy the magic of Diffusion models!
chenxwh/Florence-VL
chenxwh/NOVA
NOVA: Autoregressive Video Generation without Vector Quantization
chenxwh/OminiControl
A minimal and universal controller for FLUX.1.
chenxwh/OmniParser
A simple screen parsing tool towards pure vision based GUI agent
chenxwh/CogView3
text to image to generation: CogView3-Plus and CogView3(ECCV 2024)
chenxwh/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
chenxwh/Depth-Anything-V2
Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
chenxwh/DepthCrafter
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
chenxwh/echomimic
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
chenxwh/hart
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
chenxwh/LTX-Video
Official repository for LTX-Video
chenxwh/OneDiffusion