Avatar-based speech separation by Upload AI LLC

The idea is based on Dual-path Transformer. We add a modulator to incorporate speaker information and thus achieve personalized speech models. The overview of the model architecture is shown below.

Prerequisites
Running code
Check results

Prerequisites

(↑up to contents)

Download the Avatar10Mix2 dataset, which contains audios recorded from 10 speakers:

cd datasets
sh download_avatar10mix2.sh
cd ..

Install dependencies:

pip install -r requirements.txt

How to run the code

(↑up to contents) The training and testing code for separating speech from ambient noise is provided in speech_vs_ambient. Change the directory to speech_vs_ambient and run the following commands:

Training

python train.py --exp_dir exp/speech_vs_ambient

Testing

python eval.py --exp_dir exp/speech_vs_ambient

Check results

(↑up to contents) We provide a simple webpage to review good test examples, which can be found at

exp/speech_vs_ambient/vis/examples/

The training curves are logged with Tensorboard. To view them, run

tensorboard --logdir exp/speech_vs_ambient/lightning_logs/

cajal/AvaTr

Avatar-based speech separation by Upload AI LLC

Contents

Prerequisites

How to run the code

Check results