choiHkk's Stars
allenai/longformer
Longformer: The Long-Document Transformer
SHI-Labs/Neighborhood-Attention-Transformer
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
fkodom/dilated-attention-pytorch
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
jsyoon0823/TimeGAN
Codebase for Time-series Generative Adversarial Networks (TimeGAN) - NeurIPS 2019
toinsson/pysdtw
Torch implementation of Soft-DTW, supports CUDA.
quanghuyn94/moe-tts-webui
The better web ui for MOE-TTS
microsoft/UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
voicepaw/so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
openai/glow
Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
p0p4k/pflowtts_pytorch
Unofficial implementation of NVIDIA P-Flow TTS paper
Stability-AI/generative-models
Generative Models by Stability AI
VinAIResearch/XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
ChenyangSi/FreeU
FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)
csteinmetz1/auraloss
Collection of audio-focused loss functions in PyTorch
r9y9/wavenet_vocoder
WaveNet vocoder
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
regeirk/pycwt
A Python module for continuous wavelet spectral analysis. It includes a collection of routines for wavelet transform and statistical analysis via FFT algorithm. In addition, the module also includes cross-wavelet transforms, wavelet coherence tests and sample scripts.
Anjok07/ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
openvpi/vocoders
DiffSinger community vocoders release page
yxlllc/DDSP-SVC
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
sony/bigvsan
Pytorch implementation of BigVSAN
advimman/lama
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
interactiveaudiolab/penn
Pitch Estimating Neural Networks (PENN)
lessw2020/Ranger-Deep-Learning-Optimizer
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.