pydrobert-speech
This pure-python library allows for flexible computation of speech features.
For example, given feature configuration called fbanks.json
:
{
"name": "stft",
"bank": "fbank",
"frame_length_ms": 25,
"include_energy": true,
"pad_to_nearest_power_of_two": true,
"window_function": "hanning",
"use_power": true
}
You can compute triangular, overlapping filters like Kaldi or HTK with the commands
import json
from pydrobert.speech import *
# get the feature computer ready
params = json.load(open('fbank.json'))
computer = util.alias_factory_subclass_from_arg(compute.FrameComputer, params)
# assume "signal" is a numpy float array
feats = computer.compute_full(signal)
If you plan on using a PyTorch DataLoader
or Kaldi
tables in your ASR pipeline, you can compute all a corpus' features by
using the commands signals-to-torch-feat-dir
(requires pytorch package)
or compute-feats-from-kaldi-tables
(requires pydrobert-kaldi package).
This package can compute much more than f-banks, with many different permutations. Consult the documentation for a more in-depth discussion of how to use it.
Documentation
Installation
pydrobert-speech is available via both PyPI and Conda.
conda install -c sdrobert pydrobert-speech
pip install pydrobert-speech
pip install git+https://github.com/sdrobert/pydrobert-speech # bleeding edge
Licensing and How to Cite
Please see the pydrobert page for more details on how to cite this package.
util.read_signal
can read NIST SPHERE files. To do so, code was adapted from
NIST sph2pipe
program
and put into pydrobert.speech._sphere
. License information can be found in
LICENSE_sph2pipe
. Please note that the license only permits the use of their
code to decode the "shorten" file type, not encode it.