aFewThings
I'm a Ph.D. student in Electrical Engineering at Korea University and am interested in various ML/DL tasks.
Multimedia Information Lab.South Korea
aFewThings's Stars
lllyasviel/ControlNet
Let us control diffusion models!
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
xiph/rnnoise
Recurrent neural network for audio noise reduction
Stability-AI/stable-audio-tools
Generative models for conditional audio generation
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
crowsonkb/k-diffusion
Karras et al. (2022) diffusion models for PyTorch
LAION-AI/CLAP
Contrastive Language-Audio Pretraining
NVlabs/edm
Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
Text-to-Audio/AudioLCM
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
declare-lab/tango
A family of diffusion models for text-to-audio generation.
ShihaoZhaoZSH/Uni-ControlNet
[NeurIPS 2023] Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
facebookresearch/AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
sail-sg/MDT
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
google-research/leaf-audio
LEAF is a learnable alternative to audio features such as mel-filterbanks, that can be initialized as an approximation of mel-filterbanks, and then be trained for the task at hand, while using a very small number of parameters.
ivcylc/qa-mdt
OpenMusic: SOTA Text-to-music (TTM) Generation
Anima-Lab/MaskDiT
Code for Fast Training of Diffusion Models with Masked Transformers
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Stability-AI/stable-audio-metrics
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
yukara-ikemiya/friendly-stable-audio-tools
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
cwx-worst-one/EAT
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
JishengBai/AudioSetCaps
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
MorenoLaQuatra/audioset-download
This package aims at simplifying the download of the AudioSet dataset.
ZjjConan/Multi-Modal-Adapter
The official pytorch implemention of our CVPR-2024 paper "MMA: Multi-Modal Adapter for Vision-Language Models".
yuanzhi-zhu/mini_edm
Minimum implementation of EDM (Elucidating the Design Space of Diffusion-Based Generative Models) on cifar10 and mnist
wsntxxn/TextToAudioGrounding
The dataset and baseline code for Text-to-Audio Grounding (TAG)
zeyuxie29/PicoAudio
zeyuxie29/AudioTime
blingcho/VFLIP-esorics24