This repository explores the application of transformer neural networks in the domain of automatic speaker recognition (ASR). The project focuses on investigating whether transformer architectures can effectively capture and learn voice dynamics (prosodic features) when trained for speaker verification. It is based on https://github.com/YuanGongND/ssast.
Clone the Repo:
git clone https://github.com/fabianbosshard/SSAST_SV.git
git lfs fetch
This section provides step-by-step commands for running the data download process on the CAI DGX server (documentation).
Navigate to the utils/data_downloading
directory:
cd utils/data_downloading
Start a screen session:
screen
Start a SLURM session:
srun --job-name=xbfr_data_download_librispeech --pty --ntasks=1 --cpus-per-task=4 --mem=16G --gres=gpu:0 bash
Build the Docker image:
docker build \
--build-arg USER_ID=$(id -u) \
--build-arg GROUP_ID=$(id -g) \
-t xbfr_data_download_librispeech_img .
Run the Docker container:
nvidia-docker run --rm -it \
--shm-size=16g \
--name xbfr_data_download_librispeech \
--volume /cluster/home/xbfr/SSAST_SV:/workspace/SSAST_SV \
--volume /raid/xbfr:/raid/xbfr \
--env SLURM_JOB_ID \
xbfr_data_download_librispeech_img bash
Inside the Docker container, run the download script:
./download_data.sh librispeech
Detach from the screen session by pressing Ctrl + A
followed by D
.
Start a screen session:
screen
Start a SLURM session:
srun --job-name=xbfr_data_download_voxceleb --pty --ntasks=1 --cpus-per-task=4 --mem=16G --gres=gpu:0 bash
Build the Docker image:
docker build \
--build-arg USER_ID=$(id -u) \
--build-arg GROUP_ID=$(id -g) \
-t xbfr_data_download_voxceleb_img .
Run the Docker container:
nvidia-docker run --rm -it \
--shm-size=16g \
--name xbfr_data_download_voxceleb \
--volume /cluster/home/xbfr/SSAST_SV:/workspace/SSAST_SV \
--volume /raid/xbfr:/raid/xbfr \
--env SLURM_JOB_ID \
xbfr_data_download_voxceleb_img bash
Inside the Docker container, run the download script:
./download_data.sh voxceleb
Detach from the screen session by pressing Ctrl + A
followed by D
.
Start a screen session:
screen
Start a SLURM session:
srun --job-name=xbfr_data_download_audioset --pty --ntasks=1 --cpus-per-task=4 --mem=16G --gres=gpu:0 bash
Build the Docker image:
docker build \
--build-arg USER_ID=$(id -u) \
--build-arg GROUP_ID=$(id -g) \
-t xbfr_data_download_audioset_img .
Run the Docker container:
nvidia-docker run --rm -it \
--shm-size=16g \
--name xbfr_data_download_audioset \
--volume /cluster/home/xbfr/SSAST_SV:/workspace/SSAST_SV \
--volume /raid/xbfr:/raid/xbfr \
--env SLURM_JOB_ID \
xbfr_data_download_audioset_img bash
Inside the Docker container, run the download script:
./download_data.sh audioset
Detach from the screen session by pressing Ctrl + A
followed by D
.
After a while, the datasets will have been downloaded. Reattach to the screen sessions using their respective identifiers:
screen -r [pid.]tty.host
Follow the instructions below to exit the Docker container, SLURM session, and screen session.
Exit the Docker container:
exit
Exit the SLURM session:
exit
Exit the screen session:
exit
Repeat these steps for each screen session.
To monitor download progress:
Navigate to the directory:
cd utils/data_downloading
Start a SLURM session:
srun --job-name=xbfr_progress_monitoring --pty --ntasks=1 --cpus-per-task=2 --mem=8G --gres=gpu:0 bash
Build the Docker image:
docker build -f Dockerfile.analysis \
--build-arg USER_ID=$(id -u) \
--build-arg GROUP_ID=$(id -g) \
-t xbfr_progress_monitoring_img .
Run the analysis:
nvidia-docker run --rm -it \
--name xbfr_progress_monitoring \
--volume /cluster/home/xbfr/SSAST_SV:/workspace/SSAST_SV \
--volume /raid/xbfr:/raid/xbfr \
--env SLURM_JOB_ID \
xbfr_progress_monitoring_img bash
Inside the Docker container, run the analysis python script:
python3 print_info.py
Exit the container and SLURM session:
exit
exit
Repeat these steps periodically to track download progress.