knowitall/openie-demo

Suppress less informative extractions

Opened this issue · 3 comments

Our original intention with Ollie was to find as many correct extractions as possible. This way, since each application will have specific requirements, they could write simple logic to keep what they're interested in.

However, there often are strictly less informative extractions and these are not useful for many applications. For example, you might have the following:

Superintendent Janet Robinson said Adam Lanza attended Sandy Hook elementary, although she could not remember the year.

(he, was a student at, Sandy Hook)
(he, was a student at, some point)
(he, was, a student)

The last extraction is strictly less informative than the first. While in some applications (i.e. search) we may want all three so we have results for more queries, for others (i.e. document summarization) we don't want the third because it is redundant. Ollie should have an option of suppressing strictly less informative extractions.

Yes, I think it would be useful to provide a means for suppressing strictly less informative extractions.

In addition to having a option for filtering, provide access to the logic that determines if an extraction is strictly less informative than another extraction from the same sentence.

def subsumes(other:OllieExtraction)

Going on a bit of a tangent and expanding a little on this idea, the following extraction comparison methods might be useful:

def overlapsWith(other:OllieExtraction)

def sharesArg1(other:OllieExtraction)

def sharesArg2(other:OllieExtraction)

def sharesArg(other:OllieExtraction)

Perhaps, these capabilities do not belong to the core ollie library but instead belong in a separate ollie-utils library, which provides mechanisms to manipulate and transform extractions (argument, relation string normalizations, and equality under these transformations).

Applications can (and in many cases need to) write their own logic to do these things but having a default implementation that comes with the Ollie library sounds useful to me.

Niranjan, just compare the intervals if you want this functionality. I.e. extr.arg1.span overlaps extr.arg2.span or extr.span overlaps extr2.span. Or you could compare if the nodes intersect. extr.arg1.nodes intersect extr.arg2.nodes == Set.empty or extr1.nodes intersect extr2.nodes == Set.empty. If you want to check if two extractions have the same arg1, you just need to do extr1.arg1 == extr2.arg1. Adding methods that perform simple operations that can already be expressed succinctly only adds a layer of confusion (both what does the method do and which method should I use).

But I'd love to have a normalization routine for relations or arguments. If you have some, let's talk about it sometime.