dariadiatlova's Stars
Textualize/rich
Rich is a Python library for rich text and beautiful formatting in the terminal.
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
Stability-AI/stable-audio-tools
Generative models for conditional audio generation
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
lucidrains/voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
TaoRuijie/ECAPA-TDNN
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
facebookresearch/textlesslib
Library for Textless Spoken Language Processing
audeering/w2v2-how-to
How to use our public wav2vec2 dimensional emotion model
facebookresearch/SONAR
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
X-LANCE/VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
p0p4k/pflowtts_pytorch
Unofficial implementation of NVIDIA P-Flow TTS paper
jishengpeng/Languagecodec
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
keonlee9420/DailyTalk
Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023
google-research-datasets/cvss
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
corl-team/rebased
Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"
theodorblackbird/lina-speech
lina-speech : linear attention based text-to-speech
X-LANCE/UniCATS-CTX-vec2wav
[AAAI 2024] Code for CTX-vec2wav in UniCATS
nii-yamagishilab/ZMM-TTS
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
shang0712/HierTTS
deepvk/NISQA-s
ECNU-Cross-Innovation-Lab/ShiftSER
[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
nivibilla/efficient-vits-finetuning
Finetuning VITS Efficiently
Lallapallooza/fast-audiomentations
⚡ Blazing fast audio augmentation in Python, powered by GPU for high-efficiency processing in machine learning and audio analysis tasks.
EMOsuperb/EMO-SUPERB-submission
EMO-SUPERB submission
HappyColor/Vesper
A Compact and Effective Pretrained Model for Speech Emotion Recognition
msplabresearch/MSP-Podcast_Challenge
MSP-Podcast Challenge Baseline Code
deepvk/muse
🎵 muse: Music Separation