How to understand multiple entity tags?

Question

How to understand multiple entity tags?

huangruizhe opened this issue 2 years ago · 3 comments

token|speaker|ts|endTs|punctuation|case|tags|wer_tags
Good|0||||UC|[]|[]
morning|0||||LC|['5:TIME']|['5']
and|0||||LC|[]|[]
welcome|0||||LC|[]|[]
to|0||||LC|[]|[]
the|0||||LC|['6:DATE']|['6']
first|0||||LC|['6:DATE']|['6']
quarter|0||||LC|['6:DATE']|['6']
2020|0||||CA|['0:YEAR']|['0', '1', '6']
NexGEn|0||||MC|['7:ORG']|['7']

How to understand that the word "2020" has three entity tags ['0', '1', '6']?
Thanks.

Answer 1 · 2022-11-08T14:25:16.000Z

Hi there!

Thanks for reaching out with your question -- so the way to understand what the three tags are is by using the corresponding wer_tag.json file.

In the examples we provided there's the corresponding wer_tag.json just below the example you shared.

{
  "0":{
    "entity_type" : "YEAR"
  },
  "1":{
    "entity_type" : "CARDINAL"
  },
  "5":{
    "entity_type" : "TIME"
  },
  "6":{
    "entity_type" : "DATE"
  },
  "7":{
    "entity_type" : "ORG"
  }
}

So for 2020 the tag 0 corresponds to a YEAR entity, 1 corresponds to a CARDINAL entity, and 6 corresponds to a DATE entity. Our reasoning is that the token 2020 on its own is a year and a cardinal number but in context of "the first quarter 2020" its a date! As a result we apply all three entity tags to it and include the ID of those entities in the wer_tags column.

Hope this helps but let me know if there's anything else still confusing you!

Thanks,
Miguel

Answer 2 · 2022-11-08T17:42:07.000Z

So, in this case, 2020 is tagged as three types of entities. Thank for the nice explanation!
And this word will also be counted in three entity-type-specific WERs computation, right?

Answer 3 · 2022-11-09T18:48:25.000Z

Sorry for the late reply -- yes! it will count the token in all three WER categories!