How to understand multiple entity tags?
huangruizhe opened this issue · 3 comments
In this example:
token|speaker|ts|endTs|punctuation|case|tags|wer_tags
Good|0||||UC|[]|[]
morning|0||||LC|['5:TIME']|['5']
and|0||||LC|[]|[]
welcome|0||||LC|[]|[]
to|0||||LC|[]|[]
the|0||||LC|['6:DATE']|['6']
first|0||||LC|['6:DATE']|['6']
quarter|0||||LC|['6:DATE']|['6']
2020|0||||CA|['0:YEAR']|['0', '1', '6']
NexGEn|0||||MC|['7:ORG']|['7']
How to understand that the word "2020" has three entity tags ['0', '1', '6']?
Thanks.
Hi there!
Thanks for reaching out with your question -- so the way to understand what the three tags are is by using the corresponding wer_tag.json
file.
In the examples we provided there's the corresponding wer_tag.json
just below the example you shared.
{
"0":{
"entity_type" : "YEAR"
},
"1":{
"entity_type" : "CARDINAL"
},
"5":{
"entity_type" : "TIME"
},
"6":{
"entity_type" : "DATE"
},
"7":{
"entity_type" : "ORG"
}
}
So for 2020
the tag 0
corresponds to a YEAR
entity, 1
corresponds to a CARDINAL
entity, and 6
corresponds to a DATE
entity. Our reasoning is that the token 2020
on its own is a year and a cardinal number but in context of "the first quarter 2020" its a date! As a result we apply all three entity tags to it and include the ID of those entities in the wer_tags
column.
Hope this helps but let me know if there's anything else still confusing you!
Thanks,
Miguel
So, in this case, 2020 is tagged as three types of entities. Thank for the nice explanation!
And this word will also be counted in three entity-type-specific WERs computation, right?
Sorry for the late reply -- yes! it will count the token in all three WER categories!