choiHkk

SpeechSynthesis

Seoul

choiHkk's Stars

ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language:Python33.1k 476 18.4k5.6k
Stability-AI/generative-models
Generative Models by Stability AI
Language:Python24.1k 255 3022.7k
Anjok07/ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.
Language:Python17.5k 154 1.3k1.3k
voicepaw/so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
Language:Python8.7k 67 3601.2k
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.4k 76 520593
advimman/lama
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Language:Jupyter Notebook7.8k 84 252832
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
Language:Python7.8k 47 01.1k
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Language:Python4.7k 78 189384
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python4.5k 58 151384
openai/glow
Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
Language:Python3.1k 232 97515
r9y9/wavenet_vocoder
WaveNet vocoder
Language:Python2.3k 96 193499
allenai/longformer
Longformer: The Long-Document Transformer
Language:Python2k 42 228271
yxlllc/DDSP-SVC
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
Language:Python1.8k 22 61241
ChenyangSi/FreeU
FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)
1.7k 43 3059
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.4k 25 66104
lessw2020/Ranger-Deep-Learning-Optimizer
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
Language:Python1.2k 36 43176
SHI-Labs/Neighborhood-Attention-Transformer
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
Language:Python1k 16 7585
jsyoon0823/TimeGAN
Codebase for Time-series Generative Adversarial Networks (TimeGAN) - NeurIPS 2019
Language:Jupyter Notebook832 10 83259
csteinmetz1/auraloss
Collection of audio-focused loss functions in PyTorch
Language:Python719 17 3566
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
Language:Jupyter Notebook701 17 4369
microsoft/UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Language:Python419 20 4473
VinAIResearch/XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
Language:Python294 10 2135
regeirk/pycwt
A Python module for continuous wavelet spectral analysis. It includes a collection of routines for wavelet transform and statistical analysis via FFT algorithm. In addition, the module also includes cross-wavelet transforms, wavelet coherence tests and sample scripts.
Language:Python291 24 34104
openvpi/vocoders
DiffSinger community vocoders release page
Language:HTML269 6 326
interactiveaudiolab/penn
Pitch Estimating Neural Networks (PENN)
Language:Python228 9 1121
p0p4k/pflowtts_pytorch
Unofficial implementation of NVIDIA P-Flow TTS paper
Language:Python210 13 4230
sony/bigvsan
Pytorch implementation of BigVSAN
Language:Python196 29 616
fkodom/dilated-attention-pytorch
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
Language:Python49 5 79
toinsson/pysdtw
Torch implementation of Soft-DTW, supports CUDA.
Language:Python31 2 42
quanghuyn94/moe-tts-webui
The better web ui for MOE-TTS
Language:Python234

choiHkk

choiHkk's Stars

ray-project/ray

Stability-AI/generative-models

Anjok07/ultimatevocalremovergui

voicepaw/so-vits-svc-fork

facebookresearch/xformers

advimman/lama

fishaudio/Bert-VITS2

yl4579/StyleTTS2

open-mmlab/Amphion

openai/glow

r9y9/wavenet_vocoder

allenai/longformer

yxlllc/DDSP-SVC

ChenyangSi/FreeU

QwenLM/Qwen-Audio

lessw2020/Ranger-Deep-Learning-Optimizer

SHI-Labs/Neighborhood-Attention-Transformer

jsyoon0823/TimeGAN

csteinmetz1/auraloss

teticio/audio-diffusion

microsoft/UniSpeech

VinAIResearch/XPhoneBERT

regeirk/pycwt

openvpi/vocoders

interactiveaudiolab/penn

p0p4k/pflowtts_pytorch

sony/bigvsan

fkodom/dilated-attention-pytorch

toinsson/pysdtw

quanghuyn94/moe-tts-webui