thunlp/RE-Context-or-Names

infer.py function of 'annotate_sent' my suggenstion

y0nde opened this issue · 0 comments

y0nde commented

The original function has a problem. If "E1" is "the apple" and "E2" is "the banana", "E2" cannot be annotated because of "continue" in "E1" annotation section.
For example, how about "I prefer the banana to the apple."?
E1 == apple ,E2 == banana here.
The word "the" in E2"the banana" wouldn't be recognized instead of detection of E1"the apple".
That's why I changed the function not to avoids the problem. New function is below.
My function may be a bad code and increase execution time, but it can improve E2 detection.

def annotate_sent(self, sent_nlp, e1, e2):
        annotated = ''
        e1start, e1end, e2start, e2end = 0, 0, 0, 0
        for i,token in enumerate(sent_nlp):
            if not isinstance(e1, list):
                if (token.text == e1.text) and (e1start == 0) and (e1end == 0):
                    e1spos=i
                    e1epos=i
                    e1start, e1end = 1, 1
            elif len(e1) == 1:
                if (token.text == e1[0].text) and (e1start == 0) and (e1end == 0):
                    e1spos=i
                    e1epos=i
                    e1start, e1end = 1, 1
            else:
                if (token.text == e1[0].text) and (e1start == 0):
                    e1spos=i
                    e1start += 1
                elif (token.text not in [i.text for i in e1]) and (e1start == 1) and (e1end == 0):
                    e1start = 0
                elif (token.text == e1[-1].text) and (e1end == 0) and (e1start == 1):
                    e1epos=i
                    e1end += 1
           
            if not isinstance(e2, list):
                if (token.text == e2.text) and (e2start == 0) and (e2end == 0):
                    e2spos=i
                    e2epos=i
                    e2start, e2end = 1, 1
                    continue
            elif len(e2) == 1:
                if (token.text == e2[0].text) and (e2start == 0) and (e2end == 0):
                    e2spos=i
                    e2epos=i
                    e2start, e2end = 1, 1
                    continue
            else:
                if (token.text == e2[0].text) and (e2start == 0):
                    e2spos=i
                    e2start += 1
                    continue
                elif (token.text not in [i.text for i in e2]) and (e2start == 1) and (e2end == 0):
                    e2start = 0
                elif (token.text == e2[-1].text) and (e2end == 0) and (e2start == 1):
                    e2epos=i
                    e2end += 1
                    continue
        for i,token in enumerate(sent_nlp):
            if i == e1spos:
                annotated += ' [E1]'
            if i == e2spos:
                annotated += ' [E2]' 
            annotated += ' ' + token.text + ' '
            if i == e1epos:
                annotated += '[/E1] '
            if i == e2epos:
                annotated += '[/E2] '
            
        annotated = annotated.strip()
        annotated = re.sub(' +', ' ', annotated)
        return annotated