julius-richter
PhD student at Universität Hamburg working on deep generative models for speech enhancement.
Hamburg, Berlin
julius-richter's Stars
probml/pml-book
"Probabilistic Machine Learning" - a book series by Kevin Murphy
haoheliu/AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
state-spaces/s4
Structured state space sequence models
gnobitab/RectifiedFlow
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
baofff/U-ViT
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
facebookresearch/av_hubert
A self-supervised learning framework for audio-visual speech
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
krantiparida/awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
Yuan-ManX/ai-audio-datasets
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
sp-uhh/sgmse
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
researchmm/MM-Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
celebv-text/CelebV-Text
(CVPR 2023) CelebV-Text: A Large-Scale Facial Text-Video Dataset
ruizhecao96/CMGAN
Conformer-based Metric GAN for speech enhancement
sihyun-yu/PVDM
Official PyTorch implementation of Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023).
NVIDIA/CleanUNet
Official PyTorch Implementation of CleanUNet (ICASSP 2022)
neillu23/CDiffuSE
Conditional Diffusion Probabilistic Model for Speech Enhancement
sp-uhh/storm
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
x4nth055/gender-recognition-by-voice
Building a Deep learning model that predicts the gender of a speaker using TensorFlow 2
RoySheffer/im2wav
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
facebookresearch/EasyComDataset
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.
YUCHEN005/NASE
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
ahaliassos/raven
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
sukun1045/video-physics-sound-diffusion
sp-uhh/sgmse-bbed
TODO
YangangCao/Causal-U-Net
unofficial PyTorch implementation of 《A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement》
hmartelb/avlit
Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model" (AVLIT)
taketakeseijin/HarmonicLowering
Implementation of Harmonic Convolution by Harmonic Lowering
sp-uhh/guided-vae-nmf
This is the repository of the paper