gmltmd789
Ph.D. Candidate at Seoul National University, Republic of Korea. Interested in Spoken Language Model, Speech Synthesis, and Generative Model
Seoul National UniversitySeoul, Republic of Korea
gmltmd789's Stars
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
VITA-MLLM/VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
jianfch/stable-ts
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
BytedanceSpeech/seed-tts-eval
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
facebookresearch/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
src-d/kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
MrYxJ/calculate-flops.pytorch
The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)
yangdongchao/UniAudio
The Open Source Code of UniAudio
AudioLLMs/Awesome-Audio-LLM
Audio Large Language Models
Stability-AI/stable-codec
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
VideoVerses/VideoVAEPlus
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
zhenye234/X-Codec-2.0
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
EricLBuehler/xlora
X-LoRA: Mixture of LoRA Experts
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
YuchuanTian/U-DiT
[NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"
descriptinc/cargan
Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"
sarulab-speech/UTMOSv2
UTokyo-SaruLab MOS Prediction System
HKUNLP/diffusion-of-thoughts
[NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"
JishengBai/AudioSetCaps
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
line/open-universe
Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.
naver-ai/usdm
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
vivian556123/NeurIPS2024-CoVoMix
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
JongyoonSong/K-StereoSet
pengzhendong/streaming-vocos
Streaming Vocos
Saehyung-Lee/PlugIR
Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)
YoonhyungLee94/TadaStride
Official PyTorch implementation of the paper "AdaStride: Using Adaptive Strides in Sequential Data for Effective Downsampling"
12kimih/SAELens
Training Sparse Autoencoders on Language Models