idiap/IBDiarization

Compairing RTTM file

Closed this issue · 7 comments

Hi,

I have successfully diarized a meeting from AMI corpus. I want to check the diarization error that my output contains. How do I check? I checked AMI corpus but did not find rttm file there to compare with rttm file that is generated as a result of diarization. Can anybody help?

Hello,

You will have to create your own RTTM files from groundtruth
available in the AMI dataset. A sample RTTM file is already available
in the package.

Srikanth

Hi @mrsrikanth Can you please elaborate more. I did not get you.

The AMI dataset has manual annotations that has
segment-level information (in XML format) for each speaker for each recording.
You can easily convert this to RTTM format to use as groundtruth for
diarization.

Srikanth

The segment folder contains information about each recording only , I did not see speaker information in that folder so here is image of one of the xml file in segment folder. Can you point it out where is speaker information?
screenshot from 2017-03-30 15-35-38

The file name that you have attached is IS1000a.A.segments.xml.
The 'A' in the suffix is the speaker name for that file. You will also find
segment information for speakers B, C and D. To translate A, B, C and D
to global-level speaker identities, you can check .corpusResources/meetings.xml.

Thanks @mrsrikanth for your help. Few last things : Are there any overlapping between the speakers, for eg speaker s1 is speaking from let say t=1 to t=5 and speaker s2 speaking from t=3 to t=7. So are there any such cases like this one?

@aishwaryjoshi31 You may find a script I wrote useful for converting the AMI Corpus's XML files to RTTM format. Typical invocation:

nite_xml_to_rttm.py ~/Downloads/ami_public_manual_1.6.2/words/ES2008a.*.words.xml |
    sort -n -k 4 > /tmp/ES2008a.rttm