Yolanda-Gao

Speech synthesis, analysis and machine learning. Ph.D. in ECE from CMU.

Carnegie Mellon UniversityPittsburgh, USA

Yolanda-Gao's Stars

langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
Language:Jupyter Notebook94.9k 691 7.9k15.4k
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python71.3k 576 08.5k
psf/black
The uncompromising Python code formatter
Language:Python39.2k 230 2.6k2.5k
explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Language:Python30.2k 562 5.7k4.4k
pytorch/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Language:Python22.4k 398 6409.5k
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python21k 206 3852.2k
yoheinakajima/babyagi
Language:Python20.4k 301 1512.7k
pytorch/vision
Datasets, Transforms and Models specific to Computer Vision
Language:Python16.3k 435 3.3k7k
nltk/nltk
NLTK Source
Language:Python13.6k 463 1.8k2.9k
Uberi/speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Language:Python8.4k 277 6102.4k
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Language:Python8.3k 154 5431.1k
apache/beam
Apache Beam is a unified programming model for Batch and Streaming data processing.
Language:Java7.9k 257 7.2k4.3k
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++5.9k 62 625893
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Language:Python3.5k 57 71304
riffusion/riffusion
Stable diffusion for real-time music generation
Language:Python3.3k 38 93380
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
Language:Python3k 87 98419
maxbachmann/RapidFuzz
Rapid fuzzy string matching in Python using various string metrics
Language:C++2.2k 29 224100
facebookresearch/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Language:Python1.7k 36 149302
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Language:Python1.2k 24 87113
kuleshov/audio-super-res
Audio super resolution using neural networks
Language:Python1.2k 26 58206
WenzheLiu-Speech/awesome-speech-enhancement
speech enhancement\speech seperation\sound source localization
1k 42 1221
haoheliu/voicefixer
General Speech Restoration
Language:Python1k 17 59133
aliutkus/speechmetrics
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Language:Python908 23 33154
DmitryRyumin/INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
616 87 442
NVIDIA/NeMo-text-processing
NeMo text processing for ASR and TTS
Language:Python280 15 3689
haoheliu/voicefixer_main
General Speech Restoration
Language:Python276 11 1856
haoheliu/ssr_eval
Evaluation and Benchmarking of Speech Super-resolution Methods
Language:Python141 4 1112
sunits/rir_simulator_python
Room impulse response simulator using python
Language:Python93 6 529
AI4Bharat/NPTEL2020-Indian-English-Speech-Dataset
NPTEL2020: Speech2Text dataset for Indian-English Accent
Language:Python72 5 1920
HarunoriKawano/BEST-RQ
Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.
Language:Python59 5 33

Yolanda-Gao

Yolanda-Gao's Stars

langchain-ai/langchain

openai/whisper

psf/black

explosion/spaCy

pytorch/examples

facebookresearch/audiocraft

yoheinakajima/babyagi

pytorch/vision

nltk/nltk

Uberi/speech_recognition

facebookresearch/demucs

apache/beam

NVIDIA/FasterTransformer

facebookresearch/encodec

riffusion/riffusion

enhuiz/vall-e

maxbachmann/RapidFuzz

facebookresearch/denoiser

microsoft/SpeechT5

kuleshov/audio-super-res

WenzheLiu-Speech/awesome-speech-enhancement

haoheliu/voicefixer

aliutkus/speechmetrics

DmitryRyumin/INTERSPEECH-2023-Papers

NVIDIA/NeMo-text-processing

haoheliu/voicefixer_main

haoheliu/ssr_eval

sunits/rir_simulator_python

AI4Bharat/NPTEL2020-Indian-English-Speech-Dataset

HarunoriKawano/BEST-RQ