WenzheLiu-Speech
Hi, I am Wenzhe Liu. I work for Kuaishou, and was employed by Tencent. focusing on generalized speech enhancement, audio codec and speech synthesis
TencentBeijing, China
Pinned Repositories
aac-datasets
Audio Captioning datasets for PyTorch.
ADSP_Tutorials
Advanced Signal Processing Notebooks and Tutorials
ai-audio-datasets
AI Audio Datasets 🎵. A list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
awesome-speech-enhancement
speech enhancement\speech seperation\sound source localization
penguins-aicodec-demo
pyaec
simple and efficient python implemention of a series of adaptive filters (lms、nlms、rls、kalman、Frequency Domain Adaptive Filter、Partitioned-Block-Based Frequency Domain Adaptive Filter、Frequency Domain Kalman Filter、Partitioned-Block-Based Frequency Domain Kalman Filter) for acoustic echo cancellation.
Realtime_AudioDenoise_EchoCancellation
sound-source-localization-algorithm_DOA_estimation
关于语音信号声源定位DOA估计所用的一些传统算法
The-guidebook-of-speech-enhancement
wenzheliu-speech
WenzheLiu-Speech's Repositories
WenzheLiu-Speech/awesome-speech-enhancement
speech enhancement\speech seperation\sound source localization
WenzheLiu-Speech/The-guidebook-of-speech-enhancement
WenzheLiu-Speech/penguins-aicodec-demo
WenzheLiu-Speech/ai-audio-datasets
AI Audio Datasets 🎵. A list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
WenzheLiu-Speech/wenzheliu-speech
WenzheLiu-Speech/aac-datasets
Audio Captioning datasets for PyTorch.
WenzheLiu-Speech/awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
WenzheLiu-Speech/MP-SENet
MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
WenzheLiu-Speech/Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works (such as Music Synthesis, Automatic Music Transcription, Automatic MOS Prediction, SSL-based ASR...etc).
WenzheLiu-Speech/speech-synthesis-paper
List of speech synthesis papers.
WenzheLiu-Speech/torchsubband
Pytorch implementation of subband decomposition
WenzheLiu-Speech/WenzheLiu-Speech.github.io
WenzheLiu-Speech/aero
Audio Super Resolution in the Spectral Domain
WenzheLiu-Speech/cutword
一个简单快速的分词、命名实体识别工具
WenzheLiu-Speech/EasyRec
A framework for large scale recommendation algorithms.
WenzheLiu-Speech/effective_llm_alignment
Effective LLM Alignment Toolkit
WenzheLiu-Speech/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
WenzheLiu-Speech/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
WenzheLiu-Speech/McNet
The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023
WenzheLiu-Speech/minbpe
Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
WenzheLiu-Speech/multi_quantization
WenzheLiu-Speech/nano-llama31
nanoGPT style version of Llama 3.1
WenzheLiu-Speech/OpenVoice
Instant voice cloning by MyShell.
WenzheLiu-Speech/SE-IFD
WenzheLiu-Speech/SoundStorm
The reproduced code for Google's SoundStorm
WenzheLiu-Speech/the-algorithm
Source code for Twitter's Recommendation Algorithm
WenzheLiu-Speech/tts-frontend-dataset
TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization
WenzheLiu-Speech/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
WenzheLiu-Speech/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
WenzheLiu-Speech/XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)