Yolanda-Gao
Speech synthesis, analysis and machine learning. Ph.D. in ECE from CMU.
Carnegie Mellon UniversityPittsburgh, USA
Yolanda-Gao's Stars
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
psf/black
The uncompromising Python code formatter
explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
pytorch/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
yoheinakajima/babyagi
pytorch/vision
Datasets, Transforms and Models specific to Computer Vision
nltk/nltk
NLTK Source
Uberi/speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
apache/beam
Apache Beam is a unified programming model for Batch and Streaming data processing.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
riffusion/riffusion
Stable diffusion for real-time music generation
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
maxbachmann/RapidFuzz
Rapid fuzzy string matching in Python using various string metrics
facebookresearch/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
kuleshov/audio-super-res
Audio super resolution using neural networks
WenzheLiu-Speech/awesome-speech-enhancement
speech enhancement\speech seperation\sound source localization
haoheliu/voicefixer
General Speech Restoration
aliutkus/speechmetrics
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
DmitryRyumin/INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
NVIDIA/NeMo-text-processing
NeMo text processing for ASR and TTS
haoheliu/voicefixer_main
General Speech Restoration
haoheliu/ssr_eval
Evaluation and Benchmarking of Speech Super-resolution Methods
sunits/rir_simulator_python
Room impulse response simulator using python
AI4Bharat/NPTEL2020-Indian-English-Speech-Dataset
NPTEL2020: Speech2Text dataset for Indian-English Accent
HarunoriKawano/BEST-RQ
Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.