gmltmd789

Ph.D. Candidate at Seoul National University, Republic of Korea. Interested in Spoken Language Model, Speech Synthesis, and Generative Model

Seoul National UniversitySeoul, Republic of Korea

gmltmd789's Stars

SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Language:Python10.8k 100 5601.5k
VITA-MLLM/VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Language:Python2.2k 48 110165
jianfch/stable-ts
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
Language:Python1.8k 32 291195
BytedanceSpeech/seed-tts-eval
Language:Python1.2k 13 16114
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language:Python1k 13 1680
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
938 50 358
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Language:Python903 31 57108
facebookresearch/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
Language:Python891 18 2158
src-d/kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Language:Jupyter Notebook826 27 103146
MrYxJ/calculate-flops.pytorch
The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)
Language:Python737 3 4728
yangdongchao/UniAudio
The Open Source Code of UniAudio
Language:Python551 37 3334
AudioLLMs/Awesome-Audio-LLM
Audio Large Language Models
Language:Python480 25 2229
Stability-AI/stable-codec
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
Language:Python345 29 1223
VideoVerses/VideoVAEPlus
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Language:Python303 5 117
zhenye234/X-Codec-2.0
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Language:Python237 13 2527
EricLBuehler/xlora
X-LoRA: Mixture of LoRA Experts
Language:Python215 5 1712
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
Language:Python209 15 518
YuchuanTian/U-DiT
[NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"
Language:Python192 5 169
descriptinc/cargan
Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"
Language:Python187 22 1430
sarulab-speech/UTMOSv2
UTokyo-SaruLab MOS Prediction System
Language:Python162 6 615
HKUNLP/diffusion-of-thoughts
[NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"
Language:Python135 6 49
JishengBai/AudioSetCaps
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
Language:Python123 3 12
line/open-universe
Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.
Language:Python91 5 310
naver-ai/usdm
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
Language:Python83 8 03
vivian556123/NeurIPS2024-CoVoMix
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Language:Python48 3 73
JongyoonSong/K-StereoSet
32 1 02
pengzhendong/streaming-vocos
Streaming Vocos
Language:Python21 3 23
Saehyung-Lee/PlugIR
Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)
Language:Python21 3 13
YoonhyungLee94/TadaStride
Official PyTorch implementation of the paper "AdaStride: Using Adaptive Strides in Sequential Data for Effective Downsampling"
Language:Python8 1 00
12kimih/SAELens
Training Sparse Autoencoders on Language Models
Language:Jupyter Notebook1 0 00

gmltmd789

gmltmd789's Stars

SWivid/F5-TTS

VITA-MLLM/VITA

jianfch/stable-ts

BytedanceSpeech/seed-tts-eval

ictnlp/StreamSpeech

ga642381/speech-trident

gemelo-ai/vocos

facebookresearch/spiritlm

src-d/kmcuda

MrYxJ/calculate-flops.pytorch

yangdongchao/UniAudio

AudioLLMs/Awesome-Audio-LLM

Stability-AI/stable-codec

VideoVerses/VideoVAEPlus

zhenye234/X-Codec-2.0

EricLBuehler/xlora

yangdongchao/RSTnet

YuchuanTian/U-DiT

descriptinc/cargan

sarulab-speech/UTMOSv2

HKUNLP/diffusion-of-thoughts

JishengBai/AudioSetCaps

line/open-universe

naver-ai/usdm

vivian556123/NeurIPS2024-CoVoMix

JongyoonSong/K-StereoSet

pengzhendong/streaming-vocos

Saehyung-Lee/PlugIR

YoonhyungLee94/TadaStride

12kimih/SAELens