/AvaTr

Official implementation of the paper "AvaTr: One-Shot Speaker Extraction with Transformers"

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Avatar-based speech separation by Upload AI LLC


The idea is based on Dual-path Transformer. We add a modulator to incorporate speaker information and thus achieve personalized speech models. The overview of the model architecture is shown below. AvaDPT

Contents

Prerequisites

(↑up to contents)

  1. Download the Avatar10Mix2 dataset, which contains audios recorded from 10 speakers:
cd datasets
sh download_avatar10mix2.sh
cd ..
  1. Install dependencies:
pip install -r requirements.txt

How to run the code

(↑up to contents) The training and testing code for separating speech from ambient noise is provided in speech_vs_ambient. Change the directory to speech_vs_ambient and run the following commands:

  1. Training
python train.py --exp_dir exp/speech_vs_ambient
  1. Testing
python eval.py --exp_dir exp/speech_vs_ambient

Check results

(↑up to contents) We provide a simple webpage to review good test examples, which can be found at

exp/speech_vs_ambient/vis/examples/

The training curves are logged with Tensorboard. To view them, run

tensorboard --logdir exp/speech_vs_ambient/lightning_logs/