GateNLP/python-gatenlp

Issues with set ownership and gathering annotations based on an annspec

johann-petrak opened this issue · 2 comments

  • Removing an annotation from a set using annset.remove(ann) is currently only possible if the actual annotation object ann is contained in the set AND if the owning set is annset or None.
  • However, if we gather annotations from several attached sets using an annspec with doc.anns(annspec) the resulting annotations are shallow copies and do not have an owning set
  • Pampac gets the matching annotation from such a detached set and hence cannot remote it directly.

This is again an issue related to the identify of an annotation and the consequences of which interpretation of identity is chosen:

If we want to use the annspec approach for selecting annotations:

  • we cannot use annotation id in general as that may change in the detached set if we get duplicate ids
  • we cannot use the hash of the object since we use shallow copies and we have to use shallow copies in order to be able to change the id if needed
  • so there is no general solution that will always work

The only way to deal with this is to not allow the use of annspec any more and instead go back to using an annotation set which limits the flexibility of the pampac annotator.

We need to think hard about what the best compromise is here.

Going back to just allowing single annotation sets instead of the result of an annspec selection is probably the safest approach. Since that set will need to get specified by name it will always be an attached set, so removal by instance should be no problem.

This has now been changed to allow single set annotation specifications only. The methods used to create the final iterable of annotations used by pampac have been changed to guarantee that the original annotation is included (and not a shallow copy) so the remove ann action now works as expected.