MandarGogate's Stars
JuanFMontesinos/VoViT
VoViT: Low Latency Graph-based Audio-Visual VoiceSeparation Transformer
binwiederhier/ntfy
Send push notifications to your phone or desktop using PUT/POST
TeaPoly/PLCPA-ASYM-Loss
The power-law compressed phase-aware asymmetric (PLCPA-ASYM) loss
ml-explore/mlx-swift-examples
Examples using MLX Swift
JusperLee/TDANet
An efficient speech separation method
keunwoochoi/kapre
kapre: Keras Audio Preprocessors
KinWaiCheuk/nnAudio
Audio processing by using pytorch 1D convolution network
neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
mosaicml/composer
Supercharge Your Model Training
jonashaag/pydct
Short-Time Discrete Cosine Transform (DCT) for Python. SciPy, TensorFlow and PyTorch implementations.
sungwon23/BSRNN
facebookresearch/facestar
Facestar dataset. High quality audio-visual recordings of human conversational speech.
facebook/docusaurus
Easy to maintain open source documentation websites.
microsoft/P.808
This is an open-source implementation of the ITU P.808 standard for "Subjective evaluation of speech quality with a crowdsourcing approach" (see https://www.itu.int/rec/T-REC-P.808/en). It uses Amazon Mechanical Turk as the crowdsourcing platform. It includes implementations for Absolute Category Rating (ACR), Degradation Category Rating (DCR), and Comparison Category Rating (CCR).
reflex-dev/reflex
🕸️ Web apps in pure Python 🐍
mamba-org/mamba
The Fast Cross-Platform Package Manager
microsoft/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Jungjee/RawNet
Official repository for RawNet, RawNet2, and RawNet3
churichard/notabase
A second brain for your knowledge, thoughts, and ideas.
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
cogmhear/avse_challenge
COG-MHEAR Audio-Visual Speech Enhancement Challenge
wichmann-lab/python-psignifit
Python clone of psignifit providing basic functionality
huyanxin/DeepComplexCRN
Rudrabha/Lip2Wav
This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
cogmhear/Intelligibility-Oriented-Audio-Visual-Speech-Enhancement
Towards Intelligibility-Oriented Audio-Visual Speech Enhancement
microsoft/Deep3DFaceReconstruction
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019)
facebookresearch/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
apache/mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
mlflow/mlflow
Open source platform for the machine learning lifecycle
lutzroeder/netron
Visualizer for neural network, deep learning and machine learning models