mfcc

There are 293 repositories under mfcc topic.

  • ddbourgin/numpy-ml

    Machine learning, in numpy

    Language:Python16.1k455503.8k
  • aubio/aubio

    a library for audio and music analysis

    Language:C3.5k86350394
  • audioFlux

    libAudioFlux/audioFlux

    A library for audio and music analysis, feature extraction.

    Language:C3.2k3219145
  • x4nth055/emotion-recognition-using-speech

    Building and training Speech Emotion Recognizer that predicts human emotions using Python, Sci-kit learn and Keras

    Language:Python6462334247
  • ar1st0crat/NWaves

    .NET DSP library with a lot of audio processing functions

    Language:C#507297184
  • spafe

    SuperKogito/spafe

    :sound: spafe: Simplified Python Audio Features Extraction

    Language:Python477114679
  • adamstark/Gist

    A C++ Library for Audio Analysis

    Language:C++382262376
  • gionanide/Speech_Signal_Processing_and_Classification

    Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

    Language:Python24910465
  • sp-nitech/SPTK

    A suite of speech signal processing tools

    Language:C++24017627
  • jsingh811/pyAudioProcessing

    Audio feature extraction and classification

    Language:Python22631941
  • ewan-xu/LibrosaCpp

    LibrosaCpp is a c++ implemention of librosa to compute short-time fourier transform coefficients,mel spectrogram or mfcc

    Language:C++21841350
  • SuperKogito/Voice-based-gender-recognition

    :sound: :boy: :girl:Voice based gender recognition using Mel-frequency cepstrum coefficients (MFCC) and Gaussian mixture models (GMM)

    Language:Python217101469
  • csukuangfj/kaldifeat

    Kaldi-compatible online & offline feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd - Provide C++ & Python API

    Language:C++20673937
  • sp-nitech/diffsptk

    A differentiable version of SPTK

    Language:Python1908915
  • SuyashMore/MevonAI-Speech-Emotion-Recognition

    Identify the emotion of multiple speakers in an Audio Segment

    Language:C17571348
  • tympanix/subsync

    Synchronize your subtitles using machine learning

    Language:Python15791816
  • amanbasu/speech-emotion-recognition

    Detecting emotions using MFCC features of human speech using Deep Learning

    Language:Jupyter Notebook1303938
  • ZhuoZhuoCrayon/AcousticKeyBoard-Web

    ❓声学键盘|脑洞大开:做一个能听懂键盘敲击键位的「玩具」,学习信号处理 / 深度学习 / 安卓 / Django。

    Language:Python90165
  • GauravWaghmare/Speaker-Identification

    A program for automatic speaker identification using deep learning techniques.

    Language:Python84101126
  • MycroftAI/sonopy

    A simple audio feature extraction library

    Language:Python807221
  • mathquis/node-personal-wakeword

    Personal wake word detector

    Language:JavaScript66268
  • ZitengWang/python_kaldi_features

    python codes to extract MFCC and FBANK speech features for Kaldi

    Language:Python666418
  • k-farruh/speech-accent-detection

    The human speaks a language with an accent. A particular accent necessarily reflects a person's linguistic background. The model defines accent based audio record. The result of the model could be used to determine accents and help decrease accents to English learning students and improve accents by training.

    Language:Python621213
  • georgid/AlignmentDuration

    Lyrics-to-audio-alignement system. Based on Machine Learning Algorithms: Hidden Markov Models with Viterbi forced alignment. The alignment is explicitly aware of durations of musical notes. The phonetic model are classified with MLP Deep Neural Network.

    Language:Python584626
  • zafarrafii/Zaf-Python

    Zafar's Audio Functions in Python for audio signal analysis: STFT, inverse STFT, mel filterbank, mel spectrogram, MFCC, CQT kernel, CQT spectrogram, CQT chromagram, DCT, DST, MDCT, inverse MDCT.

    Language:Jupyter Notebook571112
  • stefantaubert/mel-cepstral-distance

    A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment".

    Language:Python55
  • alicex2020/Deep-Learning-Lie-Detection

    Use machine learning models to detect lies based solely on acoustic speech information

    Language:Jupyter Notebook542411
  • SuperKogito/Voice-based-speaker-identification

    :sound: :boy: :girl: :woman: :man: Speaker identification using voice MFCCs and GMM

    Language:Python543115
  • supikiti/PNCC

    A implementation of Power Normalized Cepstral Coefficients: PNCC

    Language:Python533510
  • aubio/vamp-aubio-plugins

    aubio plugins for Vamp

    Language:C++499610
  • zafarrafii/Zaf-Matlab

    Zafar's Audio Functions in Matlab for audio signal analysis: STFT, inverse STFT, mel filterbank, mel spectrogram, MFCC, CQT kernel, CQT spectrogram, CQT chromagram, DCT, DST, MDCT, inverse MDCT.

    Language:Jupyter Notebook493014
  • sheelabhadra/Emergency-Vehicle-Detection

    Python implementation of papers on emergency vehicle detection using audio signals

    Language:Jupyter Notebook475414
  • mechanicalsea/spectra

    Spectra extraction tutorials based on torch and torchaudio.

    Language:Jupyter Notebook41414
  • Live-Audio-MFCC

    pulakk/Live-Audio-MFCC

    Live Audio MFCC Visualization in the browser using Web Audio API - https://pulakk.github.io/Live-Audio-MFCC/tutorial

    Language:JavaScript41206
  • zhengyima/DTW_Digital_Voice_Recognition

    基于DTW与MFCC特征进行数字0-9的语音识别,DTW,MFCC,语音识别,中英数据,端点检测,Digital Voice Recognition。

    Language:Python39319
  • FedericaPaoli1/stm32-speech-recognition-and-traduction

    stm32-speech-recognition-and-traduction is a project developed for the Advances in Operating Systems exam at the University of Milan (academic year 2020-2021). It implements a speech recognition and speech-to-text translation system using a pre-trained machine learning model running on the stm32f407vg microcontroller.

    Language:C37309