dariadiatlova's Stars
wenet-e2e/speech-synthesis-paper
List of speech synthesis papers.
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
borisshapa/inception-v3-numpy
Implementation of the popular network Inception v3 on Numpy. Implementation of the AdaSmooth optimizer. Comparison of optimizers on Cars dataset.
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
keonlee9420/Comprehensive-Transformer-TTS
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
salute-developers/golos
microsoft/CLAP
Learning audio concepts from natural language supervision
lingjzhu/CharsiuG2P
Multilingual G2P in 100 languages
deepvk/vitrina
👀 VITRina: VIsual Token Representations
MasayaKawamura/MB-iSTFT-VITS
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
diff-usion/Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
audeering/opensmile
The Munich Open-Source Large-Scale Multimedia Feature Extractor
chrisdonahue/sheetsage
Transcribe music into lead sheets!
Howuhh/sac-n-jax
Single-file SAC-N implementation on jax with flax and equinox. 10x faster than pytorch
huggingface/diffusion-models-class
Materials for the Hugging Face Diffusion Models Course
tsurumeso/vocal-remover
Vocal Remover using Deep Neural Networks
magenta/music-spectrogram-diffusion
POZAlabs/ComMU-code
[NeurIPS'22] Official code of "ComMU: Dataset for Combinatorial Music Generation"
YatingMusic/remi
"Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions", ACM Multimedia 2020
keonlee9420/Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
hujinsen/pytorch-StarGAN-VC
Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .
suzuki256/dog-dataset
tinkoff-ai/palbert
Code for the paper "PALBERT: Teaching ALBERT to Ponder", NeurIPS 2022 Spotlight
microsoft/muzic
Muzic: Music Understanding and Generation with Artificial Intelligence
openai/jukebox
Code for the paper "Jukebox: A Generative Model for Music"
maum-ai/phaseaug
ICASSP 2023 Accepted
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
CompVis/latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
openai/guided-diffusion