Pinned Repositories
AutoVowelDuration
Automatic Measurement of Vowel Duration for Consonant Vowel Consonant (CVC) sound files (JASA 2016)
DeepAnomaly
Recurrent Neural Networks for Anomaly Detection using Time Series Data
DeepSegmentor
Sequence Segmentation using Joint RNN and Structured Prediction Models (ICASSP 2017)
GCommandsPytorch
ConvNets for Audio Recognition using Google Commands Dataset
WatermarkNN
Watermarking Deep Neural Networks (USENIX 2018)
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
svoice
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
textlesslib
Library for Textless Spoken Language Processing
adiyoss's Repositories
adiyoss/WatermarkNN
Watermarking Deep Neural Networks (USENIX 2018)
adiyoss/GCommandsPytorch
ConvNets for Audio Recognition using Google Commands Dataset
adiyoss/DeepAnomaly
Recurrent Neural Networks for Anomaly Detection using Time Series Data
adiyoss/DeepSegmentor
Sequence Segmentation using Joint RNN and Structured Prediction Models (ICASSP 2017)
adiyoss/AutoVowelDuration
Automatic Measurement of Vowel Duration for Consonant Vowel Consonant (CVC) sound files (JASA 2016)
adiyoss/StructED
Risk Minimization Algorithms in Structured Prediction (JMLR 2016)
adiyoss/Chroma
Pitch and chroma implementation in java
adiyoss/DeepVOT
Automatic Measurement of Voice Onset Time (VOT) using Deep Recurrent Neural Networks (Interspeech 2016)
adiyoss/InDepth-Analysis
Sentence Representation Analysis
adiyoss/colman_ml
ML course @ colman
adiyoss/DeepWDM
Recurrent Neural Networks for Word Duration Measurement
adiyoss/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
adiyoss/diffq
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weight, in order to achieve a given trade-off between the model size and accuracy.
adiyoss/Expresso
Expresso dataset demo page
adiyoss/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network
Tools to Design or Visualize Architecture of Neural Network
adiyoss/adiyoss.github.io
Personal website
adiyoss/audio-cont
adiyoss/dataset
adiyoss/dotfiles
dotfiles for vim, tmux, etc.
adiyoss/dsVAE-NES
adiyoss/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
adiyoss/griffin_lim
Implementation of the Griffin and Lim algorithm to recover an audio signal from a magnitude-only spectrogram.
adiyoss/iTerm2-Color-Schemes
Over 150 terminal color schemes/themes for iTerm/iTerm2 (with ports to Terminal, Konsole, PuTTY, Xresources, XRDB, and Terminator)
adiyoss/nltk_contrib
NLTK Contrib
adiyoss/OpenNMT
Open-Source Neural Machine Translation in Torch
adiyoss/py-webrtcvad
Python interface to the WebRTC Voice Activity Detector
adiyoss/pytorch-stft
An STFT/iSTFT for PyTorch.
adiyoss/StarGAN
PyTorch Implementation of StarGAN - CVPR 2018
adiyoss/turk
adiyoss/wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit