5Hyeons's Stars
yl4579/StyleTTS-ZS
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
sh-lee-prml/PeriodWave
The official Implementation of PeriodWave and PeriodWave-Turbo
thuhcsi/mm2022-conversational-tts
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
walker-hyf/ECSS
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
hongleizhang/RSPapers
A Curated List of Must-read Papers on Recommender System.
p0p4k/vits2_pytorch
unofficial vits2-TTS implementation in pytorch
myshell-ai/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
daniilrobnikov/vits2
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
kwonminki/One-sentence_Diffusion_summary
The repo for studying and sharing diffusion models.
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
yistLin/dvector
Speaker embedding (d-vector) trained with GE2E loss
jaywalnut310/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
sony/ai-research-code
WegraLee/deep-learning-from-scratch-3
『밑바닥부터 시작하는 딥러닝 ❸』(한빛미디어, 2020)
microsoft/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
OlaWod/FreeVC
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
kkoutini/PaSST
Efficient Training of Audio Transformers with Patchout
fschmid56/EfficientAT
This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
dhchoi99/NANSY
lRomul/argus-freesound
Kaggle | 1st place solution for Freesound Audio Tagging 2019
hash2430/pitchtron
TTS for pitch-accented language. Korean dialect DB.
KinglittleQ/GST-Tacotron
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Kyubyong/g2pK
g2pK: g2p module for Korean
boostcampaitech2/object-detection-level2-cv-17
object-detection-level2-cv-17 created by GitHub Classroom