Pinned Repositories
AE_ts
Auto encoder for time series
CNN-for-single-channel-speech-enhancement
Convolutional neural nets for single channel speech enhancement
covarep
A Cooperative Voice Analysis Repository for Speech Technologies
Deep-Reinforcement-Learning-Hands-On
Hands-on Deep Reinforcement Learning, published by Packt
deepLearningForPython
Collection of example adapted and modified from the popular DL for Python book from Manning pub
EM_FA
Factor Analysis via Expectation Maximization
fast-wavenet
Speedy Wavenet generation using dynamic programming :zap:
FFTNet
Pytorch Implementation of FFTNet
gaussianMLE
A basic implementation of a Maximum Likelihood Estimation of a multivariate Gaussian distribution
rockycamp's Repositories
rockycamp/deepLearningForPython
Collection of example adapted and modified from the popular DL for Python book from Manning pub
rockycamp/AE_ts
Auto encoder for time series
rockycamp/covarep
A Cooperative Voice Analysis Repository for Speech Technologies
rockycamp/Deep-Reinforcement-Learning-Hands-On
Hands-on Deep Reinforcement Learning, published by Packt
rockycamp/FFTNet
Pytorch Implementation of FFTNet
rockycamp/general-purpose-audio-tagging
As it says
rockycamp/glow
Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
rockycamp/incubator-mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
rockycamp/Isomap
Manifold learning algorithm
rockycamp/keras-squeezenet
SqueezeNet implementation with Keras Framework
rockycamp/neural-classifiers-with-few-audio
Training neural audio classifiers with few data − https://arxiv.org/abs/1810.10274
rockycamp/prototypical-networks
Code for the NIPS 2017 Paper "Prototypical Networks for Few-shot Learning"
rockycamp/PyTorch_Speaker_Verification
PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
rockycamp/SampleRNN
Tensorflow implementation of SampleRNN
rockycamp/sampleRNN_ICLR2017
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
rockycamp/sednn
deep learning based speech enhancement using keras python, make it easy to use
rockycamp/SoundCard
A Pure-Python Real-Time Audio Library
rockycamp/Speaker-Recognition-System-using-GMM
System for identifying speaker from given speech signal using MFCC,LPC features and Gaussian Mixture Models
rockycamp/Speaker_Verification
Tensorflow implementation of generalized end-to-end loss for speaker verification
rockycamp/speech-denoising-wavenet
A neural network for end-to-end speech denoising
rockycamp/Speech_Enhancement_MMSE-STSA
A statistical model-based Speech Enhancement Using MMSE-STSA
rockycamp/Speech_Signal_Processing_and_Classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
rockycamp/SWaveNet
rockycamp/tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model
rockycamp/tech-interview-handbook
💯 Algorithms study materials, behavioral content and tips for rocking your coding interview
rockycamp/tensorflow-wavenet
A TensorFlow implementation of DeepMind's WaveNet paper
rockycamp/vqvae
Tensorflow implementation of VQVAE for voice conversion
rockycamp/vqvae-speech
Tensorflow implementation of the speech model described in Neural Discrete Representation Learning (a.k.a. VQ-VAE)
rockycamp/Wave-U-Net
Implementation of the Wave-U-Net for audio source separation
rockycamp/wavenet
Keras WaveNet implementation