SiyeolJung's Stars
CompVis/stable-diffusion
A latent text-to-image diffusion model
KindXiaoming/pykan
Kolmogorov Arnold Networks
XPixelGroup/BasicSR
Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
resemble-ai/Resemblyzer
A python package to analyze and compare voices with deep learning
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
haoheliu/AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
haoheliu/AudioLDM2
Text-to-Audio/Music Generation
chaofengc/IQA-PyTorch
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
ZiqiaoPeng/SyncTalk
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
chaofengc/Awesome-Image-Quality-Assessment
A comprehensive collection of IQA papers
lucidrains/mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
researchmm/MM-Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
wesbz/SoundStream
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
FoundationVision/OmniTokenizer
OmniTokenizer: one model and one weight for image-video joint tokenization.
iffsid/mmvae
Multimodal Mixture-of-Experts VAE
sony/sqvae
Pytorch implementation of stochastically quantized variational autoencoder (SQ-VAE)
AlphacatPlus/VmambaIR
This is official implementtaion of "VmambaIR: Visual State Space Model for Image Restoration"
lyndonzheng/CVQ-VAE
[ICCV 2023] Online Clustered Codebook
yangdongchao/LLM-Codec
The open source code for LLM-Codec
thuhcsi/S2G-MDDiffusion
facebookresearch/Qinco
Residual Quantization with Implicit Neural Codebooks
YingqianWang/DistgASR
[TPAMI 2023] DistgASR: Disentangling Mechanism for Light Field Angular Super-Resolution
Boese0601/Dyadic-Interaction-Modeling
[ECCV 2024] Dyadic Interaction Modeling for Social Behavior Generation
taegyeong-lee/Grid-Diffusion-Models-for-Text-to-Video-Generation
Official Code Repository for the paper "Grid Diffusion Models for Text-to-Video Generation", CVPR 2024
kaistmm/VoxMM
taegyeong-lee/Generating-Realistic-Images-from-In-the-wild-Sounds
Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023
sangmin-git/MMSI
Code for "Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations" (CVPR 2024 Oral)