Lin Zhang will defend her thesis on January 30th, 2024, from 16:00 to 17:00 (GMT+9). In her defense, she will summarize her work on PartialSpoof up to this point. Please feel free to join here
register here if you are interested!
This repository is an implementation of the papers related to Partial Spoof. It is adapted from project-NN-Pytorch-scripts. Below are some links that you might be interested in:
- β¬οΈ PartialSpoof Database
- π§ Sample
- π©βπ» Github for model (You are here!). ππ
- π©βπ» Github for data construction (TBA)
- π Papers: Please refer to the link in Folder and its paper
Please feel free to give suggestions and feedback. πΎ
Lin Zhang; Xin Wang; Erica Cooper; Nicholas Evans; Junichi Yamagishi
- π₯ Updation
- π Folder and Its Paper
- π² Folder Structure
- π Citation
- π€ Acknowledgments and License
- 2023-12: add metrics: EER for spoof detection; SegmentEER and RangeEER for spoof localization.
- 2023-12: add folders for multiple random seeds, and update readme.
- 2022-12: release multi-reso. and single-reso CMs.
Folder | Paper |
---|---|
00data-prepare | Processing to generate PartialSpoof database and automatic annotation. (To be released) |
01singletask | CM trained on the single task (either utterance-level or segment-level detection) in the paper An Initial Investigation for Detecting Partially Spoofed Audio (To be released) |
02multitask | CM trained on multi tasks (both utterance-level and segment-level detection) in the paper Multi-task Learning in Utterance-level and Segmental-level Spoof Detection (To be released) |
03multireso | Multi resolution CM in the paper The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance |
metric | metric used for spoof (utterance-level) detection and (segment-level) localization Range-Based Equal Error Rate for Spoof Localization |
Please go to the [Folder]/README.md
to read details of usages.
PartialSpoof
βββ 01_download_database.sh : Script used to download PartialSpoof from zenodo.
βββ 03multireso
β βββ 01_download_pretrained_models.sh : Script used to download pretrained models.
β βββ main.py
β βββ model.py : Model structure and loss are in here! same for multi/single-reso.
β βββ multi-reso : folder for multi-reso model
β βββ README.md
β βββ single-reso : folder for single-reso model
β βββ {2, 4, 8, 16, 32, 64, utt}
βββ config_ps : Config files for experiments
β βββ config_test_on_dev.py
β βββ config_test_on_eval.py
βββ env.sh
βββ Figures
β βββ EERs.pdf
β βββ PartialSpoof_logo.png
βββ LICENSE
βββ metric
β βββ cal_EER.sh
β βββ RangeEER.py
β βββ README.md
β βββ rttm_tool.py
β βββ SegmentEER.py
β βββ UtteranceEER.py
βββ database : PartialSpoof Databases
β βββ train
β βββ dev : Folder for dev set
β β βββ con_data : related data file. (following kaldi format)
β β βββ con_wav : waveform
β β βββ dev.lst : waveform list
β βββ eval
β βββ label2num : convert string labels to numerical labels.
β β βββ label2num_2cls_0sil : bonafide/spoof (More to be released)
β βββ protocols
β βββ segment_labels
β βββ vad
β βββ dev
β βββ eval
β βββ train
βββ modules
β βββ gmlp.py
β βββ LICENSE
β βββ multi_scale
β β βββ post.py
β βββ s3prl : s3prl repo
β βββ ssl_pretrain : Folder to save downloaded pretrained ssl model
βββ project-NN-Pytorch-scripts.202102 : Modified project-NN-Pytorch-scripts repo
βββ README.md
It is appreciated if you can cite the corresponding paper when the idea, code, and pretrained model are helpful to your research.
@inproceedings{zhang21ca_interspeech,
author={Lin Zhang and Xin Wang and Erica Cooper and Junichi Yamagishi and Jose Patino and Nicholas Evans},
title={{An Initial Investigation for Detecting Partially Spoofed Audio}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={4264--4268},
doi={10.21437/Interspeech.2021-738}
}
@inproceedings{zhang21_asvspoof,
author={Lin Zhang and Xin Wang and Erica Cooper and Junichi Yamagishi},
title={{Multi-task Learning in Utterance-level and Segmental-level Spoof Detection}},
year=2021,
booktitle={Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge},
pages={9--15},
doi={10.21437/ASVSPOOF.2021-2}
}
@article{10003971,
author={Zhang, Lin and Wang, Xin and Cooper, Erica and Evans, Nicholas and Yamagishi, Junichi},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance},
year={2023},
volume={31},
number={},
pages={813-825},
doi={10.1109/TASLP.2022.3233236}}
This study is partially supported by the Japanese-French joint national VoicePersonae project supported by JST CREST (JPMJCR18A6, JPMJCR20D3), JPMJFS2136 and the ANR (ANR-18-JSTS-0001), MEXT KAKENHI Grants (21K17775, 21H04906, 21K11951, 18H04112), Japan, and Google AI for Japan program.
This project is mainly licensed under the BSD 3-Clause License (./LICENSE
).
Each folder within the project may contain their corresponding LICENSE according to the external libraries used. Please refer to the README.md file in each folder for more details.
Additionally, specific licenses for some of the external libraries used are mentioned below:
modules/s3prl
is licensed under the MIT License (modules/s3prl/LICENSE.txt
), but please note that the latest version of s3prl is now under the Apache License version 2.0.project-NN-Pytorch-scripts.202102
is licensed under the BSD 3-Clause License (project-NN-Pytorch-scripts.202102/LICENSE
).modules/gmlp.py
is licensed under the MIT License (modules/LICENSE
)