View doesn't know about identical constituents

Question

View doesn't know about identical constituents

mayhewsw opened this issue 6 years ago · 7 comments

If a view has identical constituents (label, span, score) on token i, then getConstituentsCoveringToken(i) always returns a single constituent even though multiple constituents are there.

This is because View.java:453 uses a Set to aggregate constituents.

Answer 1 · 2018-07-03T19:51:22.000Z

I think the initial thinking was that there won't be any overlapping constituents (with exactly the same boundaries) in each view.
What is your use case?

Answer 2 · 2018-07-03T20:14:59.000Z

There is no restriction on adding identical (or overlapping) constituents. I guess if that's allowed (which it should be) then it should report the correct number that cover a token.

Answer 3 · 2018-07-03T21:47:09.000Z

Just to make sure I understand your point I summarize what you just said,

It should be allowed to ad identical constituents (maybe with a warning?)
when using the getConstituentsCoveringToken we should be able to retrieve all the identical constituents (which according to Stephen, it is not the case)

Answer 4 · 2018-07-04T13:34:45.000Z

Yes, although I'm rethinking the idea of allowing identical constituents now... maybe that shouldn't be allowed. Alternatively, it could just keep constituents in a set, so identical constituents are folded together.

Either way, getConstituentsCoveringToken should reflect the number, whether or not we allow identical.

Answer 5 · 2018-07-04T13:38:48.000Z

Makes sense.

@mssammon do you have any thoughts on this?

Answer 6 · 2018-07-04T15:20:01.000Z

Looking at the Constituent hashcode() and equals methods, the observed behavior only happens when the relevant constituents are identical. SpanLabelView, in theory, offers control over whether duplicate constituents can be added (I think actually this is not sufficiently strictly enforced) but that's not true for most other View types.
Personally, I don't see the point of allowing identical constituents, though maybe there's a use case for this somewhere. We could fix this by making the Constituents field in View a QueryableSet instead of a QueryableList. If at some point in the future a compelling case for allowing duplicates is made, we can create a new type of View for it.

Answer 7 · 2018-08-22T13:45:33.000Z

For what it's worth, here's a snippet that demonstrates the issue:

String[] sentence1 = {"John",  "ate", "breakfast", "gladly", "."};

List<String[]> tokenizedSentences = Collections.singletonList(sentence1);
TextAnnotation ta = BasicTextAnnotationBuilder.createTextAnnotationFromTokens("", "", tokenizedSentences);

SpanLabelView ner = new SpanLabelView("NER", ta);

// label token 0 as PER (not actually true)
Constituent c = new Constituent("PER", "NER", ta, 0,1);
ner.addConstituent(c);

// new object, but identical to c
Constituent c2 = new Constituent("PER", "NER", ta, 0,1);
ner.addConstituent(c2);

ta.addView("NER", ner);

// This returns just one constituent.
System.out.println(ner.getConstituentsCoveringToken(0));

// This returns two constituents.
System.out.println(ner.getConstituents());

// these constituents have equality, according to the measure implemented in Constituent.java
System.out.println(c.equals(c2));

// this fails (as Mark pointed out)
ner.addSpanLabel(0,1,"PER",1.);