kuhumcst/DanNet

Supersenses extra

Closed this issue · 2 comments

Expanding on #138 now that I had a meeting with Bolette and Sussi about it.

  • Comparing with the English WordNet, which uses adj rather than adjective for the adjective grouping, we should switch to match theirs.
  • Certain food-related words are tagged as verb.creation whereas they should be verb.consumption like in the OEWN. Special rules for food may apply, perhaps in other cases too.
  • Once the supersenses are cleaned up and officially in DanNet, I can begin to attach supersenses to the Semdax corpus (CoNLL-U Format). About 7700 senses exist in this corpus that come from DanNet, while the rest are mostly from DDO (not DanNet) and can't be annotated using the Supersenses from DanNet. These will have to be done manually or using data from DDO.

Current status: I have binned the verb.creation synsets into 14 separate groups based on their hypernyms. These and the remaining 9 synsets need to be checked by Sussi/Bolette.

I have remapped all of the verb.creation synsets with Bolette. They will need to be imported into the graph and added to the dataset in the next release.

The next step is to do the ConNLL-U file for Danish found at: https://www.clarin.si/repository/xmlui/handle/11356/1842

Most of these are mapped to DanNet senses. Perhaps the supersense can be added at the end of the lines? I also need to figure out a smart way to do tag much of the remaining ~2500 senses, since they are not directly linked to DanNet. Some of them might be able to be added via the lemma.