SW중심대학 경진대회 (AI부문)

주제: 생성 AI의 가짜(Fake) 음성 검출 및 탐지
기간: 2024.07.01 ~ 2024.07.19
결과: 219팀 중 10위
소속: 가천대학교 AI소프트웨어학부

MOTA

유종문	김의진	장희진	윤세현	최상현

1. 설명

AASIST with augmented audio (rawboost, DANN)

Audio augmentation process

AASIST + DANN Training / Inferencing

AASIST with denoised audio (deepfilternet)

Audio denoising process

AASIST Training / Inferencing

2. 시작

Docker 설정

docker pull pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel
docker run -it --gpus all pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel

데이터셋 다운로드

sh ./code/1_prepare_data/download.sh

Anaconda 가상환경 생성

conda create -n mota python=3.10.13 -y
conda activate mota

데이터 전처리

sh ./code/1_prepare_data/run.sh

AASIST + DANN + Rawboost

sh ./code/2_aasist_rawboost/run.sh

AASIST + Denoise

sh ./code/3_aasist_denoise/run.sh

앙상블

sh ./code/4_ensemble/run.sh

3. 실험 환경

Ubuntu 22.04.3 LTS
NVIDIA RTX 4090
AMD EPYC 7402 24-cores
기타 환경 environment.yaml 참고

deepfilternet            0.5.6
librosa                  0.10.2.post1
soundfile                0.12.1
pandas                   2.2.2
pydub                    0.25.1
torch                    2.3.1
torchaudio               2.3.1
torchcontrib             0.0.2
tensorboard              2.17.0
tqdm                     4.66.4

4. 사전 학습 모델

AST (MIT/ast-finetuned-audioset-10-10-0.4593) : masking
https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593
DeepFilterNet : Denoising
https://github.com/Rikorose/DeepFilterNet

5. 사용된 기법

데이터 증강 : Rawboost, Audio mixing (overlapping)
모델 : AASIST, DANN(Domain Adversarial Neural Network)
데이터 전처리 : DeepFilterNet
결과 후처리 : AST(Audio Spectrogram Transformer)

6. 성능 평가지표

$$ score = 0.5 \times (1 - \text{mean AUC}) + 0.25 \times \text{mean Brier} + 0.25 \times \text{mean ECE} $$

$\text{AUC}$ : Area Under the Curve (설명)
$\text{Brier}$ : (설명)
$\text{ECE}$ : Expected Calibration Error (설명)

7. 참조

[1] SW중심대학 디지털 경진대회_SW와 생성AI의 만남 : AI 부문 (링크)
[2] AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks (논문, 구현)
[3] Domain-Adversarial Training of Neural Networks (논문, 구현)
[4] Audio Spectrogram Trnasformer (링크)
[5] DeepFilterNet (구현)
[6] RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Speaker Verification Anti-Spoofing (논문, 구현)

Orca0917/fake-audio-detection