nii-yamagishilab/PartialSpoof

Absolute time label of fake span

ductuantruong opened this issue · 3 comments

Hi,

Thank you for sharing your amazing work. I am trying to use your work in my research. I have taken a look at your data, the fake labels are at the segment level. May I ask whether you have the absolute time label of the fake span (e.g. from 1.2 seconds to 2.8 seconds) in each utterance?

Thank you for your support!

zlin0 commented

@ductuantruong
Sorry for the late reply. I've been in an extremely busy period recently ><
Are you referring to the database_vad.tar.gz?
After uncompressing, you will see:

===./database/vad/{train,dev,eval} ===
This vad folder contains the timestamp annotation for each set. 
For each <uttid>.vad file, the format of each line is: 
<start_time> <end_time> <label>

<start_time> and <end_time> are in second. 
<label> includes: '0' for spoof, '1' for bona fide, and '2' for non-speech

Please let me know if this answers your question or if there is anything specific annotation you need. Thanks!

Thank you for your reply and sharing this amazing work! No worry about the late response. I have found the timestamp label with your guide.

zlin0 commented

@ductuantruong Hi, I just realized that you asked about this issue with timestamp annotations. To avoid any confusion, I want to clarify that I have provided two types of timestamp annotations, depending on how I categorize nonspeech region. There are three types of nonspeech to consider:
(i) nonspeech from bona fide sources,
(ii) nonspeech from spoofed sources, and
(iii) the concatenated part (the nonspeech used for overlap-add).

Based on this, the annotations are as follows:

  1. spoof, bona fide:
    In this type, (and all my currently published papers), I treat (i) as bona fide, while (ii) and (iii) as spoof.
    database_segment_labels_v1.2.tar.gz
    PS_data.tar.gz

  2. spoof, bonafide, nonspeech (nonspeech includes i, ii, iii)
    database_vad.tar.gz