michael-kuhlmann
PhD Student at Paderborn University voice conversion, speech synthesis, voice profiling
Paderborn UniversityPaderborn
michael-kuhlmann's Stars
archinetai/a-unet
A toolbox that provides hackable building blocks for generic 1D/2D/3D UNets, in PyTorch.
archinetai/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
audiolabs/webMUSHRA
a MUSHRA compliant web audio API based experiment software
microsoft/P.808
This is an open-source implementation of the ITU P.808 standard for "Subjective evaluation of speech quality with a crowdsourcing approach" (see https://www.itu.int/rec/T-REC-P.808/en). It uses Amazon Mechanical Turk as the crowdsourcing platform. It includes implementations for Absolute Category Rating (ACR), Degradation Category Rating (DCR), and Comparison Category Rating (CCR).
s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
auspicious3000/contentvec
speech self-supervised representations
Alexander-H-Liu/dinosr
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
RameenAbdal/StyleFlow
StyleFlow: Attribute-conditioned Exploration of StyleGAN-generated Images using Conditional Continuous Normalizing Flows (ACM TOG 2021)
facebookresearch/voxpopuli
A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
phizaz/diffae
Official implementation of Diffusion Autoencoders
openai/guided-diffusion
DiffEqML/torchdyn
A PyTorch library entirely dedicated to neural differential equations, implicit models and related numerical methods
stefanwebb/flowtorch
This library would form a permanent home for reusable components for deep probabilistic programming. The library would form and harness a community of users and contributors by focusing initially on complete infra and documentation for how to use and create components.
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
bshall/knn-vc
Voice Conversion With Just Nearest Neighbors
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
fgnt/meeteval
MeetEval - A meeting transcription evaluation toolkit
nvidia-riva/riva-asrlib-decoder
Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
kan-bayashi/LibriTTSLabel
Alignment files of LibriTTS.
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
bootphon/phonemizer
Simple text to phones converter for multiple languages
dmort27/panphon
Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.
lingjzhu/CharsiuG2P
Multilingual G2P in 100 languages
jaywalnut310/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
xinjli/allosaurus
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
IDRnD/VoxTube
The VoxTube dataset official repository
cvqluu/Angular-Penalty-Softmax-Losses-Pytorch
Angular penalty loss functions in Pytorch (ArcFace, SphereFace, Additive Margin, CosFace)
google/gin-config
Gin provides a lightweight configuration framework for Python