infer.py function of 'annotate_sent' my suggenstion
y0nde opened this issue · 0 comments
y0nde commented
The original function has a problem. If "E1" is "the apple" and "E2" is "the banana", "E2" cannot be annotated because of "continue" in "E1" annotation section.
For example, how about "I prefer the banana to the apple."?
E1 == apple ,E2 == banana here.
The word "the" in E2"the banana" wouldn't be recognized instead of detection of E1"the apple".
That's why I changed the function not to avoids the problem. New function is below.
My function may be a bad code and increase execution time, but it can improve E2 detection.
def annotate_sent(self, sent_nlp, e1, e2):
annotated = ''
e1start, e1end, e2start, e2end = 0, 0, 0, 0
for i,token in enumerate(sent_nlp):
if not isinstance(e1, list):
if (token.text == e1.text) and (e1start == 0) and (e1end == 0):
e1spos=i
e1epos=i
e1start, e1end = 1, 1
elif len(e1) == 1:
if (token.text == e1[0].text) and (e1start == 0) and (e1end == 0):
e1spos=i
e1epos=i
e1start, e1end = 1, 1
else:
if (token.text == e1[0].text) and (e1start == 0):
e1spos=i
e1start += 1
elif (token.text not in [i.text for i in e1]) and (e1start == 1) and (e1end == 0):
e1start = 0
elif (token.text == e1[-1].text) and (e1end == 0) and (e1start == 1):
e1epos=i
e1end += 1
if not isinstance(e2, list):
if (token.text == e2.text) and (e2start == 0) and (e2end == 0):
e2spos=i
e2epos=i
e2start, e2end = 1, 1
continue
elif len(e2) == 1:
if (token.text == e2[0].text) and (e2start == 0) and (e2end == 0):
e2spos=i
e2epos=i
e2start, e2end = 1, 1
continue
else:
if (token.text == e2[0].text) and (e2start == 0):
e2spos=i
e2start += 1
continue
elif (token.text not in [i.text for i in e2]) and (e2start == 1) and (e2end == 0):
e2start = 0
elif (token.text == e2[-1].text) and (e2end == 0) and (e2start == 1):
e2epos=i
e2end += 1
continue
for i,token in enumerate(sent_nlp):
if i == e1spos:
annotated += ' [E1]'
if i == e2spos:
annotated += ' [E2]'
annotated += ' ' + token.text + ' '
if i == e1epos:
annotated += '[/E1] '
if i == e2epos:
annotated += '[/E2] '
annotated = annotated.strip()
annotated = re.sub(' +', ' ', annotated)
return annotated