PercepNet

Unofficial implementation of PercepNet: A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech described in https://arxiv.org/abs/2008.04259

https://www.researchgate.net/publication/343568932_A_Perceptually-Motivated_Approach_for_Low-Complexity_Real-Time_Enhancement_of_Fullband_Speech

Todo

Requirements

CMake
Sox
Python>=3.6
Pytorch

Build & Training

This repository is tested on Ubuntu 20.04(WSL2)

setup CMake build environments

sudo apt-get install cmake

make binary directory & build

mkdir bin && cd bin
cmake ..
make -j
cd ..

feature generation for training with sampleData

bin/src/percepNet sampledata/speech/speech.pcm sampledata/noise/noise20db.raw 4000 test.output

Convert output binary to h5

python3 utils/bin2h5.py test.output training.h5

Training

python3 rnn_train.py

Dump weight from pytorch to c++ header

python3 dump_percepnet.py model.pt

Inference

cd bin
cmake ..
make -j1
cd ..
bin/src/percepNet_run test_input.pcm percepnet_output.pcm

SampleData

clean speech - VCTK 48k wav https://datashare.is.ed.ac.uk/handle/10283/2791 (clean_train_set)

noise data - DEMAND 48k wav https://zenodo.org/record/1227121#__sid=js0 (*.48k.zip)

Acknowledgements

@jasdasdf, @sTarAnna, @cookcodes, @xyx361100238, @zhangyutf, @TeaPoly, @rameshkunasi, @OscarLiau, @YangangCao, Jaeyoung Yang

IIP Lab. Sogang Univ

Reference

https://github.com/wil-j-wil/py_bank

https://github.com/dgaspari/pyrapt

https://github.com/xiph/rnnoise

https://github.com/mozilla/LPCNet