karkirowle's Stars
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
mamba-org/mamba
The Fast Cross-Platform Package Manager
asteroid-team/asteroid
The PyTorch-based audio source separation toolkit for researchers
julius-speech/julius
Open-Source Large Vocabulary Continuous Speech Recognition Engine
dexplo/bar_chart_race
Create animated bar chart races in Python with matplotlib
kaegi/alass
"Automatic Language-Agnostic Subtitle Synchronization"
wenet-e2e/wespeaker
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
NVIDIA/radtts
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
jhpoelen/zenodo-upload
upload big files to Zenodo using cURL, jq and bash
jxzhanggg/nonparaSeq2seqVC_code
Implementation code of non-parallel sequence-to-sequence VC
microsoft/P.808
This is an open-source implementation of the ITU P.808 standard for "Subjective evaluation of speech quality with a crowdsourcing approach" (see https://www.itu.int/rec/T-REC-P.808/en). It uses Amazon Mechanical Turk as the crowdsourcing platform. It includes implementations for Absolute Category Rating (ACR), Degradation Category Rating (DCR), and Comparison Category Rating (CCR).
k2kobayashi/crank
A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder
talhanai/speech-nlp-datasets
Contains links to publicly available datasets for modeling health outcomes using speech and language.
brentspell/torch-yin
Yin pitch estimator in PyTorch
idiap/acoustic-simulator
Implementation of audio degradation processes
tarepan/VoiceConversionLab
Collect Voice Conversion researches
KunZhou9646/seq2seq-EVC
This is the implementation of our Interspeech 2021 paper: Limited data emotional voice conversion leveraging text-to-speech: two-stage sequence-to-sequence training.
LeoniusChen/Attentions-in-Tacotron
articulatory/articulatory
Deep Articulatory Synthesis and Inversion
r9y9/jsut-lab
HTS-style full-context labels for JSUT v1.1
nils-werner/pymushra
pyMUSHRA is a python web application which hosts webMUSHRA experiments and collects the data with python.
KunZhou9646/controllable_evc_code
This is the code for controllable EVC framework for seen and unseen emotion generation.
maclandrol/FisherExact
Fisher exact test for mxn contingency table in python
MingjieChen/LowResourceVC
Voice conversion training with 109 speakers with limited training samples
6gsn/marine
jdvala/zoom_audio_transcribe
Zoom Audio Transcription offline
soumimaiti/speechlmscore_tool
stoneMo/ASVspoof