choiHkk's Stars
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Stability-AI/generative-models
Generative Models by Stability AI
Anjok07/ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.
voicepaw/so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
advimman/lama
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
openai/glow
Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
r9y9/wavenet_vocoder
WaveNet vocoder
allenai/longformer
Longformer: The Long-Document Transformer
yxlllc/DDSP-SVC
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
ChenyangSi/FreeU
FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
lessw2020/Ranger-Deep-Learning-Optimizer
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
SHI-Labs/Neighborhood-Attention-Transformer
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
jsyoon0823/TimeGAN
Codebase for Time-series Generative Adversarial Networks (TimeGAN) - NeurIPS 2019
csteinmetz1/auraloss
Collection of audio-focused loss functions in PyTorch
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
microsoft/UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
VinAIResearch/XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
regeirk/pycwt
A Python module for continuous wavelet spectral analysis. It includes a collection of routines for wavelet transform and statistical analysis via FFT algorithm. In addition, the module also includes cross-wavelet transforms, wavelet coherence tests and sample scripts.
openvpi/vocoders
DiffSinger community vocoders release page
interactiveaudiolab/penn
Pitch Estimating Neural Networks (PENN)
p0p4k/pflowtts_pytorch
Unofficial implementation of NVIDIA P-Flow TTS paper
sony/bigvsan
Pytorch implementation of BigVSAN
fkodom/dilated-attention-pytorch
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
toinsson/pysdtw
Torch implementation of Soft-DTW, supports CUDA.
quanghuyn94/moe-tts-webui
The better web ui for MOE-TTS