/NeXt_TDNN_ASV

Official repository of NeXt-TDNN for speaker verification

Primary LanguagePython

NeXt-TDNN for Speaker Verification

This repository is the official implementation of "NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification" accepted in ICASSP 2024 Paper Link

News

🔥 December, 2023: We have uploaded the pre-trained models of our NeXt-TDNN in the experiments folder!

🍀 February 2024, the NeXt-TDNN model was updated with cyclic learning rate scheduling. This update improved the EER from 0.79% to 0.72%. Changes were made to the LR scheduling, gradient clipping value, and batch size. Please check configs/NeXt_TDNN_C256_B3_K65_7_cyclical_lr_step.py for details.

0. Getting Start

Prerequisites

This code requires the following:

  • lightning == 2.1.2

Installation

  • CUDA, PyToch installation
# CUDA
conda install -c "nvidia/label/cuda-11.8.0" cuda

# PyTorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Data preparation

1. Model Training

To train ASV model, run main script in train mode. You can select the desired training configuration through config argument.

  • to train NeXt-TDNN(C=256, B=3)
python main.py --mode train --config configs/NeXt_TDNN_C256_B3_K65_7

2. Model Test

To test on VoxCeleb1, run the script below. As in training, select the desired test configuration.

# VoxCeleb1-O
python main.py --mode test --config configs/NeXt_TDNN_C256_B3_K65_7

# ⚡ VoxCeleb1-O, VoxCeleb1-E, VoxCeleb1-H
python main.py --mode test_all --config configs/NeXt_TDNN_C256_B3_K65_7

3. Reference

4. Citation

If you find our work useful, please refer to

@misc{heo2023nexttdnn,
      title={NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification}, 
      author={Hyun-Jun Heo and Ui-Hyeop Shin and Ran Lee and YoungJu Cheon and Hyung-Min Park},
      year={2023},
      eprint={2312.08603},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}