/Speech_Feature_Extraction

Feature extraction of speech signal is the initial stage of any speech recognition system.

Primary LanguagePython

Speech Feature Extraction

The repository describes the feature extraction methods for speech signals.

Free speech datasets

  • OpenLSR: OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition.
  • VoxForge: VoxForge is now mirroring the LT and the Teleccoperation group Open Speech Data Corpus for German with 35 hours of speech from about 180 speakers.
  • TIMIT: The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
  • Mozilla Speech: Mozilla Releases the world's Second Largest Public Voice Data Set on Nov 29th, 2017.
  • Open Data for Deep Learning

File description

  • feature_extraction_functions.py: a set of feature extraction functions from RDShi-SpeakerCount.
  • MFCC: Mel-frequency cepstral coefficients calculation.
    • MFCC.py, MFCCTest.py: Compute the MFCC feature.
    • FeatureExtraction.ipynb: Speech preprocessing, including loading data, pre-emphasis, framing, window, Fourier-transform, power spectrum, filter banks, mfccs and mean normalization.
  • Volume: volume calculation.
  • ZeroCR: Zero-Crossing Rate calculation.
  • Pitch: Pitch calculation and pitch tracking.
  • Timbre: spectrogram drawing.
  • VAD: EPD (End-Point Detection), or Speech Detection, or VAD(Voice Activity Detection).

Requirements

Anaconda3 (Python3.x)

References & Code source