manojpamk/pytorch_xvectors

Some problems when making evaluation on AMI dev and test dataset.

Closed this issue · 2 comments

Hi manoj,

Sorry to bother you again. I have followed your reply to continue the evaluation on AMI corpus.

  1. First of all, I have dowload the dataset, and split the dev and test same as the paper: https://arxiv.org/pdf/1902.03190.pdf. Secondly, I used the kaldi "ami" recipe to make data preparation and got "segments" and "utt2spk" files. Thirdly and then,I used the two files to produce the "rttm" files.

  2. I used those files and trained x-vector models to evaluate the DER performance. Surprisingly, when I used the PLDA model that trained by the subset of vox dataset, the DER on AMI dev and test both are 0. So, I checked the produced "rttm" files and found out that the problem. That is ,every utterance only has one single speaker. So, the result makes no sense. The specific details of files are as follows.
    (1)segments
    AMI_ES2011a_H00_FEE041_0003427_0003714 AMI_ES2011a_H00 34.27 37.14
    AMI_ES2011a_H00_FEE041_0003714_0003915 AMI_ES2011a_H00 37.14 39.15
    AMI_ES2011a_H00_FEE041_0003915_0004332 AMI_ES2011a_H00 39.15 43.32
    AMI_ES2011a_H00_FEE041_0004332_0004439 AMI_ES2011a_H00 43.32 44.39
    AMI_ES2011a_H00_FEE041_0004643_0004763 AMI_ES2011a_H00 46.43 47.63
    AMI_ES2011a_H00_FEE041_0004763_0005020 AMI_ES2011a_H00 47.63 50.2
    AMI_ES2011a_H00_FEE041_0005020_0005133 AMI_ES2011a_H00 50.2 51.33
    AMI_ES2011a_H00_FEE041_0005133_0005553 AMI_ES2011a_H00 51.33 55.53
    AMI_ES2011a_H00_FEE041_0005553_0005685 AMI_ES2011a_H00 55.53 56.85
    AMI_ES2011a_H00_FEE041_0005856_0006217 AMI_ES2011a_H00 58.56 62.17
    AMI_ES2011a_H00_FEE041_0006217_0006428 AMI_ES2011a_H00 62.17 64.28
    AMI_ES2011a_H00_FEE041_0007704_0007898 AMI_ES2011a_H00 77.04 78.98
    AMI_ES2011a_H00_FEE041_0007898_0008079 AMI_ES2011a_H00 78.98 80.79
    AMI_ES2011a_H00_FEE041_0008298_0008364 AMI_ES2011a_H00 82.98 83.64
    AMI_ES2011a_H00_FEE041_0008364_0008924 AMI_ES2011a_H00 83.64 89.24
    AMI_ES2011a_H00_FEE041_0009602_0009635 AMI_ES2011a_H00 96.02 96.35
    AMI_ES2011a_H00_FEE041_0009826_0010223 AMI_ES2011a_H00 98.26 102.23
    .......
    .......
    (2)utt2spk
    AMI_ES2011a_H00_FEE041_0003427_0003714 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0003714_0003915 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0003915_0004332 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0004332_0004439 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0004643_0004763 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0004763_0005020 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0005020_0005133 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0005133_0005553 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0005553_0005685 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0005856_0006217 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0006217_0006428 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0007704_0007898 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0007898_0008079 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0008298_0008364 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0008364_0008924 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0009602_0009635 AMI_ES2011a_H00_FEE041
    AMI_ES2011a_H00_FEE041_0009826_0010223 AMI_ES2011a_H00_FEE041
    ........
    ........
    (3) rttm
    SPEAKER AMI_ES2011a_H00 1 34.27 2.87 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 37.14 2.01 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 39.15 4.17 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 43.32 1.07 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 46.43 1.20 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 47.63 2.57 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 50.20 1.13 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 51.33 4.20 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 55.53 1.32 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 58.56 3.61 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 62.17 2.11 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 77.04 1.94 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 78.98 1.81 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 82.98 0.66 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 83.64 5.60 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 96.02 0.33 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 98.26 3.97 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a_H00 1 102.23 5.54 AMI_ES2011a_H00_FEE041
    ........
    ........
    (4) rttm ---used the best threshold -0.5
    SPEAKER AMI_ES2011a_H00 1 34.270 10.120 1
    SPEAKER AMI_ES2011a_H00 1 46.430 10.420 1
    SPEAKER AMI_ES2011a_H00 1 58.560 5.720 1
    SPEAKER AMI_ES2011a_H00 1 77.040 3.750 1
    SPEAKER AMI_ES2011a_H00 1 82.980 6.260 1
    SPEAKER AMI_ES2011a_H00 1 96.020 0.330 1
    SPEAKER AMI_ES2011a_H00 1 98.260 9.510 1
    SPEAKER AMI_ES2011a_H00 1 108.920 45.440 1
    .........
    .........

  3. To fix the problem, I revised the "segments" ,"utt2spk" and "rttm" files. And to make sure that every audio has two or more speakers, I used the corresponding "*.Mix-Headset.wav" . But , I got the terribale DER results 52.17%. The specific details of files are as follows.
    (1)segments
    AMI_ES2011a_0003427_0003714 AMI_ES2011a 34.27 37.14
    AMI_ES2011a_0003714_0003915 AMI_ES2011a 37.14 39.15
    AMI_ES2011a_0003915_0004332 AMI_ES2011a 39.15 43.32
    AMI_ES2011a_0004332_0004439 AMI_ES2011a 43.32 44.39
    AMI_ES2011a_0004643_0004763 AMI_ES2011a 46.43 47.63
    AMI_ES2011a_0004763_0005020 AMI_ES2011a 47.63 50.2
    AMI_ES2011a_0005020_0005133 AMI_ES2011a 50.2 51.33
    AMI_ES2011a_0005133_0005553 AMI_ES2011a 51.33 55.53
    AMI_ES2011a_0005553_0005685 AMI_ES2011a 55.53 56.85
    AMI_ES2011a_0005856_0006217 AMI_ES2011a 58.56 62.17
    AMI_ES2011a_0006217_0006428 AMI_ES2011a 62.17 64.28
    AMI_ES2011a_0006500_0007004 AMI_ES2011a 65.0 70.04
    AMI_ES2011a_0007004_0007300 AMI_ES2011a 70.04 73.0
    ........
    ........
    (2)utt2spk
    AMI_ES2011a_0003427_0003714 AMI_ES2011a
    AMI_ES2011a_0003714_0003915 AMI_ES2011a
    AMI_ES2011a_0003915_0004332 AMI_ES2011a
    AMI_ES2011a_0004332_0004439 AMI_ES2011a
    AMI_ES2011a_0004643_0004763 AMI_ES2011a
    AMI_ES2011a_0004763_0005020 AMI_ES2011a
    AMI_ES2011a_0005020_0005133 AMI_ES2011a
    AMI_ES2011a_0005133_0005553 AMI_ES2011a
    AMI_ES2011a_0005553_0005685 AMI_ES2011a
    AMI_ES2011a_0005856_0006217 AMI_ES2011a
    AMI_ES2011a_0006217_0006428 AMI_ES2011a
    AMI_ES2011a_0006500_0007004 AMI_ES2011a
    AMI_ES2011a_0007004_0007300 AMI_ES2011a
    ........
    ........
    (3)rttm
    SPEAKER AMI_ES2011a 1 34.27 2.87 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 37.14 2.01 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 39.15 4.17 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 43.32 1.07 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 46.43 1.20 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 47.63 2.57 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 50.20 1.13 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 51.33 4.20 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 55.53 1.32 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 58.56 3.61 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 62.17 2.11 AMI_ES2011a_H00_FEE041
    SPEAKER AMI_ES2011a 1 65.00 5.04 AMI_ES2011a_H03_FEE044
    SPEAKER AMI_ES2011a 1 70.04 2.96 AMI_ES2011a_H03_FEE044
    ........
    ........
    (4)rttm ---used the best threshold 0.2
    SPEAKER AMI_ES2011a 1 34.270 2.870 3
    SPEAKER AMI_ES2011a 1 37.140 1.125 4
    SPEAKER AMI_ES2011a 1 38.265 0.885 3
    SPEAKER AMI_ES2011a 1 39.150 1.125 4
    SPEAKER AMI_ES2011a 1 40.275 0.750 3
    SPEAKER AMI_ES2011a 1 41.025 0.750 4
    SPEAKER AMI_ES2011a 1 41.775 0.750 2
    SPEAKER AMI_ES2011a 1 42.525 0.795 3
    SPEAKER AMI_ES2011a 1 43.320 1.070 4
    SPEAKER AMI_ES2011a 1 46.430 1.200 3
    SPEAKER AMI_ES2011a 1 47.630 1.125 4
    SPEAKER AMI_ES2011a 1 48.755 2.575 3
    SPEAKER AMI_ES2011a 1 51.330 1.125 4
    SPEAKER AMI_ES2011a 1 52.455 3.075 3
    SPEAKER AMI_ES2011a 1 55.530 1.320 4
    SPEAKER AMI_ES2011a 1 58.560 1.875 4
    SPEAKER AMI_ES2011a 1 60.435 0.750 3
    SPEAKER AMI_ES2011a 1 61.185 3.095 4
    SPEAKER AMI_ES2011a 1 65.000 4.125 4
    SPEAKER AMI_ES2011a 1 69.125 0.915 3
    SPEAKER AMI_ES2011a 1 70.040 7.740 4
    ........
    ........

  4. I can't firgure out this problem. Could you give me some advice ?
    And I would appreciate it if you can provide your "wav.scp"、"segments"、"utt2spk" and"rttm" files.

Yuan
@manojpamk

Hi Yuan,

Did you solve the issue?
If not, this discussion might be helpful: idiap/IBDiarization#10

Hi manoji,

I didn't solve it. I will follow that discussion.
Thanks for reply.