Labmem-Zhouyx's Stars
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
wenet-e2e/WeTextProcessing
Text Normalization & Inverse Text Normalization
k2-fsa/libriheavy
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
BytedanceSpeech/seed-tts-eval
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
DanielLin94144/StyleTalk
Official release of StyleTalk dataset.
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
CNChTu/Diffusion-SVC
facebookresearch/AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
OpenNSP/Hifi-vaegan
X-LANCE/StoryTTS
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
meta-llama/llama3
The official Meta Llama 3 GitHub site
huggingface/parler-tts
Inference and training library for high-quality TTS models.
nachifur/RDDM
CVPR 2024: Residual Denoising Diffusion Models
thuhcsi/SECap
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
lonePatient/awesome-pretrained-chinese-nlp-models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
AMAAI-Lab/mustango
Mustango: Toward Controllable Text-to-Music Generation
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
keonlee9420/Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
fighting41love/zhvoice
Chinese voice corpus. 中文语音语料,语音更加清晰自然,包含8个开源数据集,3200个说话人,900小时语音,1300万字。
daniilrobnikov/vits2
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
neonbjb/DL-Art-School
DLAS - A configuration-driven trainer for generative models
152334H/DL-Art-School
TorToiSe fine-tuning with DLAS
keonlee9420/Comprehensive-Transformer-TTS
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
DmitryRyumin/INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
declare-lab/adapter-mix