ValueError with edseg
MStaniek opened this issue · 2 comments
Hello,
thank you for your amazing tool. I am using the Version served by PYPI (downloaded with Pip). Using bparseg works without problem, but using edseg with german mate extraction leads to the following error:
Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/finitestateparsing.py:637: UserWarning: Exception in constraint: PC
warnings.warn('Exception in constraint: {0}'.format(lhs, exc))
Traceback (most recent call last):
File " Desktop/dsegpipeline/venv2/bin/discourse_segmenter", line 475, in <module>
main(sys.argv[1:])
File " Desktop/dsegpipeline/venv2/bin/discourse_segmenter", line 441, in main
edseg_segment(ifiles, args.output_trees)
File " Desktop/dsegpipeline/venv2/bin/discourse_segmenter", line 186, in edseg_segment
_output_segment_forrest(forrest, segmenter, a_output_trees, a_encoding)
File " Desktop/dsegpipeline/venv2/bin/discourse_segmenter", line 165, in _output_segment_forrest
sds_list = [a_segmenter.segment(sent) for sent in a_forrest]
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/edssegmenter.py", line 90, in segment
clauses = self._clause_segmenter.segment(sent)
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/clause_segmentation.py", line 83, in segment
chunk_tree = self._chunker.chunk(sent)
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/chunking.py", line 189, in chunk
return self._parser.parse(isent, catgetter=catgetter)
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/finitestateparsing.py", line 614, in parse
nodes = self._parse_level(rules, nodes, catgetter)
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/finitestateparsing.py", line 634, in _parse_level
flag = constraint(proxy)
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/finitestateparsing.py", line 51, in decorate
return func(match)
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/chunking.py", line 299, in pc_genitive_adjunct_constraint
if (not 'feats' in art) or (not hasattr(art['feats'], 'unifies')):
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/conll.py", line 402, in __getitem__
return self.__getattr__(name)
File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/conll.py", line 389, in __getattr__
raise AttributeError("cannot find symbol {:s}".format(name))
ValueError: Unknown format code 's' for object of type 'int'
As you can see I am using Python2.7.3, and I use the discourse_segmenter in a virtual environment.
I am calling the edseg segmenter like this:
discourse_segmenter edseg input.txt > output.txt
Now, since I am only using the pip version, I dont know if the Error still exists in the new Version, but here are the steps required to reproduce the Problem (I hope) and how I fixed it:
http://pastebin.com/TCgadSWx Here a pastebin with a parsed snippet randomly choosen from the internet.
With that the Error should be reproducable. I had to do the following hacky changes:
https://github.com/WladimirSidorenko/DiscourseSegmenter/blob/master/dsegmenter/edseg/conll.py#L389
raise AttributeError("cannot find symbol {:s}".format(name))
to
raise AttributeError("cannot find symbol {:s}".format(str(name)))
https://github.com/WladimirSidorenko/DiscourseSegmenter/blob/master/dsegmenter/edseg/conll.py#L404
raise IndexError("cannot find index {:s}".format(name))
to
raise IndexError("cannot find index {:s}".format(str(name)))
https://github.com/WladimirSidorenko/DiscourseSegmenter/blob/master/dsegmenter/edseg/conll.py#L416
raise IndexError("cannot find index {:s}".format(name))
to
raise IndexError("cannot find index {:s}".format(str(name)))
Greetings, and if its already fixed in the newest Version, then my apologies.
Greetings.
Thank you for the bug report. I will have a look at it.
Fixed
Commit: master fb8ef92
Files touched:
modified: MANIFEST.in
modified: dsegmenter/edseg/chunking.py
modified: dsegmenter/edseg/conll.py
modified: dsegmenter/edseg/finitestateparsing.py
new file: pytest.ini
modified: requirements.txt
modified: setup.cfg
modified: setup.py
new file: test-requirements.txt
new file: tests/edseg/test_conll.py
Test case added to:
tests/edseg/test_conll.py
Current output:
(venv)
sidorenko@sidorenko>
DiscourseSegmenter>discourse_segmenter edseg TCgadSWx.txt > output
(venv)
sidorenko@sidorenko>
DiscourseSegmenter>echo $?
0
will be available on PyPi with the next release