Pinned Repositories
bark
🔊 Text-Prompted Generative Audio Model
cog-RMBG
Fork of https://huggingface.co/briaai/RMBG-1.4
cog-sd-txt2imghd
Stable-diffusion with Real-ESRGAN for upsampling
cog-themed-diffusion
cog-whisper
insanely-fast-whisper
Incredibly fast Whisper-large-v3
Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
SadTalker
(CVPR 2023)SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild
video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
chenxwh's Repositories
chenxwh/bark
🔊 Text-Prompted Generative Audio Model
chenxwh/cog-RMBG
Fork of https://huggingface.co/briaai/RMBG-1.4
chenxwh/i2vgen-xl
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
chenxwh/Semantic-Segment-Anything
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
chenxwh/cog-deforum-stable-diffusion
chenxwh/Grounded-Segment-Anything
Marrying Grounding DINO with Segment Anything & Stable Diffusion & Tag2Text & BLIP & Whisper & ChatBot - Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs
chenxwh/VideoCrafter
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
chenxwh/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
chenxwh/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
chenxwh/ControlVideo
Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"
chenxwh/replicate-sd-textual-inversion
chenxwh/AudioSep
Official implementation of "Separate Anything You Describe"
chenxwh/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
chenxwh/Prompt-Free-Diffusion
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
chenxwh/UnIVAL
Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.
chenxwh/cog-I2VGen-XL
chenxwh/Cutie
[arXiv 2023] Putting the Object Back Into Video Object Segmentation
chenxwh/InternLM-XComposer
chenxwh/shap-e
Generate 3D objects conditioned on text or images
chenxwh/StableSR
Exploiting Diffusion Prior for Real-World Image Super-Resolution
chenxwh/StyleDrop-PyTorch
Unoffical implement for [StyleDrop](https://arxiv.org/abs/2306.00983)
chenxwh/T2I-Adapter
T2I-Adapter
chenxwh/cog-ledits
chenxwh/Depth-Anything
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
chenxwh/TokenFlow
Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)
chenxwh/Wuerstchen
Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models
chenxwh/FastSAM
Fast Segment Anything
chenxwh/LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
chenxwh/ResShift
ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (PyTorch)
chenxwh/webie
Dataset for web-scaled information extraction.