Labmem-Zhouyx's Stars
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
lucidrains/audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
roedoejet/FastSpeech2_ACL2022_reproducibility
facebookresearch/libri-light
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
Edresson/YourTTS
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
archinetai/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
tts-tutorial/book
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
microsoft/GLIP
Grounded Language-Image Pre-training
floodsung/Deep-Learning-Papers-Reading-Roadmap
Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
zzw922cn/awesome-speech-recognition-speech-synthesis-papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
jbmouret/matplotlib_for_papers
Handout for the tutorial "Creating publication-quality figures with matplotlib"
CompVis/stable-diffusion
A latent text-to-image diffusion model
geekjuruo/ProbExpan
SIGIR 2022: Contrastive Learning with Hard Negative Entities for Entity Set Expansion
facebookresearch/mae
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
cnlinxi/book-text-to-speech
A book about Text-to-Speech (TTS) in Chinese.
kan-bayashi/LibriTTSLabel
Alignment files of LibriTTS.
ivy-llc/ivy
Convert Machine Learning Code Between Frameworks
jasminsternkopf/mel_cepstral_distance
Computes the Mel-Cepstral Distance of two WAV files based on the paper "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment" by Robert F. Kubichek.
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
microsoft/Graphormer
Graphormer is a general-purpose deep learning backbone for molecular modeling.
HLTSingapore/Emotional-Speech-Data
This is the GitHub page for publicly available emotional speech data.
neonbjb/tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
afatcoder/LeetcodeTop
汇总各大互联网公司容易考察的高频leetcode题🔥
tuanh123789/AdaSpeech
An implementation of Microsoft's "AdaSpeech: Adaptive Text to Speech for Custom Voice"
TencentGameMate/chinese_speech_pretrain
chinese speech pretrained models