CondeNast/atjson

Expressing equivalency classes for annotations

Opened this issue · 1 comments

We often have to determine whether two documents are equivalent, which is exposed as document.equals(). This is implemented by comparing the canonical versions of the documents and checking that their content and annotations are equal. For annotations, this equivalency is implemented by checking their start and end positions match, and then doing a deep comparison of their attributes properties.

However, an annotation might have some properties, particularly in their attributes, which might not represent a meaningful difference. For example, if an annotation was created during a conversion, it is sometimes useful to include some properties from the original annotation in the converted version as signposts for verification. These properties should be ignored when determining if two annotations are equivalent.

It's currently possible to override equals on the annotation but we could provide nicer hooks. One possibility is to add a declarative API to annotations where one could list these 'non-data' attributes.

What are the results of this discussion?

I think this conversation dovetails quite nicely to the conversation we're having around inheritance / reuse. For example, do we have partial equivalence?