주제: 생성 AI의 가짜(Fake) 음성 검출 및 탐지
기간: 2024.07.01 ~ 2024.07.19
결과: 219팀 중 10위
소속: 가천대학교 AI소프트웨어학부
Audio augmentation process
AASIST + DANN Training / Inferencing
Audio denoising process
AASIST Training / Inferencing
- Docker 설정
docker pull pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel
docker run -it --gpus all pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel
- 데이터셋 다운로드
sh ./code/1_prepare_data/download.sh
- Anaconda 가상환경 생성
conda create -n mota python=3.10.13 -y
conda activate mota
- 데이터 전처리
sh ./code/1_prepare_data/run.sh
- AASIST + DANN + Rawboost
sh ./code/2_aasist_rawboost/run.sh
- AASIST + Denoise
sh ./code/3_aasist_denoise/run.sh
- 앙상블
sh ./code/4_ensemble/run.sh
- Ubuntu 22.04.3 LTS
- NVIDIA RTX 4090
- AMD EPYC 7402 24-cores
- 기타 환경 environment.yaml 참고
deepfilternet 0.5.6
librosa 0.10.2.post1
soundfile 0.12.1
pandas 2.2.2
pydub 0.25.1
torch 2.3.1
torchaudio 2.3.1
torchcontrib 0.0.2
tensorboard 2.17.0
tqdm 4.66.4
-
AST (MIT/ast-finetuned-audioset-10-10-0.4593) : masking
https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593 -
DeepFilterNet : Denoising
https://github.com/Rikorose/DeepFilterNet
- 데이터 증강 : Rawboost, Audio mixing (overlapping)
- 모델 : AASIST, DANN(Domain Adversarial Neural Network)
- 데이터 전처리 : DeepFilterNet
- 결과 후처리 : AST(Audio Spectrogram Transformer)
-
$\text{AUC}$ : Area Under the Curve (설명) -
$\text{Brier}$ : (설명) -
$\text{ECE}$ : Expected Calibration Error (설명)
-
[1] SW중심대학 디지털 경진대회_SW와 생성AI의 만남 : AI 부문 (링크)
-
[2] AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks (논문, 구현)
-
[4] Audio Spectrogram Trnasformer (링크)
-
[5] DeepFilterNet (구현)
-
[6] RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Speaker Verification Anti-Spoofing (논문, 구현)