lbehringer's Stars
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
huggingface/parler-tts
Inference and training library for high-quality TTS models.
yisol/IDM-VTON
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
libAudioFlux/audioFlux
A library for audio and music analysis, feature extraction.
gedeck/practical-statistics-for-data-scientists
Code repository for O'Reilly book
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
daochenzha/data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
oborchers/Fast_Sentence_Embeddings
Compute Sentence Embeddings Fast!
maxrmorrison/torchcrepe
Pytorch implementation of the CREPE pitch tracker
hubertsiuzdak/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
keonlee9420/Comprehensive-Transformer-TTS
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
huggingface/dataspeech
k2-fsa/libriheavy
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
nyrahealth/CrisperWhisper
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
marianne-m/brouhaha-vad
Predicts the level of noise and reverberation on your audiofiles
Wataru-Nakata/miipher
Unofficial implementation of miipher
SamsungLabs/SummaryMixing
This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is ready to be used with the SpeechBrain toolkit).
DataDome/sliceline
✂️ Fast slice finding for Machine Learning model debugging.
interactiveaudiolab/ppgs
High-Fidelity Neural Phonetic Posteriorgrams
k2-fsa/text_search
Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup
dhimasryan/MOSA-Net-Cross-Domain
yeounoh/slicefinder
automatic data slicing
msalhab96/SNR-Estimation-Using-Deep-Learning
An implementation for Frame-level Speech Signal-to-Noise Ratio Estimation using deep learning
Fraunhofer-IIS/ODAQ
audeering/w2v2-age-gender-how-to
How to use our public wav2vec2 age and gender model
sarulab-speech/Coco-Nut
Coco-Nut (Corpus of connecting NIHONGO utterance and text) corpus
utter-project/mHuBERT-147-scripts
Collection of scripts from mHuBERT-147.
cisco/multilingual-speech-testing
Test software and data for evaluation of speech processing algorithms in multiple languages
SonyResearch/project_ethics_augmented_datasheets_for_speech_datasets
Public code repo for research paper