Index error when running TokenGazetteer
johann-petrak opened this issue · 6 comments
See #92
Copied traceback info:
IndexError Traceback (most recent call last)
in
8 doc2 = Annie(doc1)
9 properdoc = ProperDoc(doc1)
---> 10 gazdoc = GazDet(properdoc)
11 for ann in gazdoc.annset("Resume"):
12 doc2.annset("Resume").add_ann(ann)
in GazDet(doc)
5 for typ in details:
6 tgaz = TokenGazetteer("data/" + typ + ".def", fmt="gate-def", annset="", outset="Resume", outtype=typ)
----> 7 gazdoc = tgaz(doc)
8 return gazdoc
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in call(self, doc, annset, tokentype, septype, splittype, withintype, all, skip)
697 for segment_start, segment_end in segment_offs:
698 tokens = list(anns.within(segment_start, segment_end))
--> 699 for matches in self.find_all(tokens, doc=doc):
700 for match in matches:
701 starttoken = tokens[match.start]
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find_all(self, tokens, doc, all, skip, fromidx, toidx, endidx, matchfunc)
617 idx = fromidx
618 while idx <= toidx:
--> 619 matches, maxlen, idx = self.find(
620 tokens,
621 doc=doc,
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find(self, tokens, doc, all, fromidx, toidx, endidx, matchfunc)
550 endidx = len(tokens)
551 while idx <= toidx:
--> 552 matches, long = self.match(
553 tokens, idx=idx, doc=doc, all=all, endidx=endidx, matchfunc=matchfunc
554 )
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in match(self, tokens, doc, all, idx, endidx, matchfunc)
454 while j <= endidx:
455 if node.nodes:
--> 456 token = tokens[j]
457 if token.type == self.splittype:
458 break
IndexError: list index out of range
@mdorkhah would you be able to (privately) share a minimal test case?
@mdorkhah would you be able to (privately) share a minimal test case?
Sure, I just sent you an email.
Thanks - I was not able to get that running yet, but I think I have actually found the bug already! :)
To test this would you be able to install gatenlp from the very latest version of the github main branch?
One way to do this would be:
- maybe create a separate environment for this and change into it
- install gatenlp from latest github main branch:
pip install -U git+https://github.com/GateNLP/python-gatenlp.git[EXTRAS]
where EXTRAS is the list of extras you need also - NOTE: this gatenlp version requires the recent new version 3.0.4 of the GATE Python plugin for the GateWorker which should get used automatically.
Thanks - I was not able to get that running yet, but I think I have actually found the bug already! :)
To test this would you be able to install gatenlp from the very latest version of the github main branch?
One way to do this would be:
- maybe create a separate environment for this and change into it
- install gatenlp from latest github main branch:
pip install -U git+https://github.com/GateNLP/python-gatenlp.git[EXTRAS]
where EXTRAS is the list of extras you need also- NOTE: this gatenlp version requires the recent new version 3.0.4 of the GATE Python plugin for the GateWorker which should get used automatically.
Works! Thank you again...
Thanks for testing!
Closing