WladimirSidorenko/DiscourseSegmenter

ValueError with edseg

MStaniek opened this issue · 2 comments

Hello,

thank you for your amazing tool. I am using the Version served by PYPI (downloaded with Pip). Using bparseg works without problem, but using edseg with german mate extraction leads to the following error:

Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/finitestateparsing.py:637: UserWarning: Exception in constraint: PC
  warnings.warn('Exception in constraint: {0}'.format(lhs, exc))
Traceback (most recent call last):
  File " Desktop/dsegpipeline/venv2/bin/discourse_segmenter", line 475, in <module>
    main(sys.argv[1:])
  File " Desktop/dsegpipeline/venv2/bin/discourse_segmenter", line 441, in main
    edseg_segment(ifiles, args.output_trees)
  File " Desktop/dsegpipeline/venv2/bin/discourse_segmenter", line 186, in edseg_segment
    _output_segment_forrest(forrest, segmenter, a_output_trees, a_encoding)
  File " Desktop/dsegpipeline/venv2/bin/discourse_segmenter", line 165, in _output_segment_forrest
    sds_list = [a_segmenter.segment(sent) for sent in a_forrest]
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/edssegmenter.py", line 90, in segment
    clauses = self._clause_segmenter.segment(sent)
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/clause_segmentation.py", line 83, in segment
    chunk_tree = self._chunker.chunk(sent)
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/chunking.py", line 189, in chunk
    return self._parser.parse(isent, catgetter=catgetter)
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/finitestateparsing.py", line 614, in parse
    nodes = self._parse_level(rules, nodes, catgetter)
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/finitestateparsing.py", line 634, in _parse_level
    flag = constraint(proxy)
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/finitestateparsing.py", line 51, in decorate
    return func(match)
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/chunking.py", line 299, in pc_genitive_adjunct_constraint
    if (not 'feats' in art) or (not hasattr(art['feats'], 'unifies')):
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/conll.py", line 402, in __getitem__
    return self.__getattr__(name)
  File " Desktop/dsegpipeline/venv2/local/lib/python2.7/site-packages/dsegmenter/edseg/conll.py", line 389, in __getattr__
    raise AttributeError("cannot find symbol {:s}".format(name))
ValueError: Unknown format code 's' for object of type 'int'

As you can see I am using Python2.7.3, and I use the discourse_segmenter in a virtual environment.

I am calling the edseg segmenter like this:

discourse_segmenter edseg input.txt > output.txt

Now, since I am only using the pip version, I dont know if the Error still exists in the new Version, but here are the steps required to reproduce the Problem (I hope) and how I fixed it:

http://pastebin.com/TCgadSWx Here a pastebin with a parsed snippet randomly choosen from the internet.

With that the Error should be reproducable. I had to do the following hacky changes:

https://github.com/WladimirSidorenko/DiscourseSegmenter/blob/master/dsegmenter/edseg/conll.py#L389
raise AttributeError("cannot find symbol {:s}".format(name))
to
raise AttributeError("cannot find symbol {:s}".format(str(name)))

https://github.com/WladimirSidorenko/DiscourseSegmenter/blob/master/dsegmenter/edseg/conll.py#L404
raise IndexError("cannot find index {:s}".format(name))
to
raise IndexError("cannot find index {:s}".format(str(name)))

https://github.com/WladimirSidorenko/DiscourseSegmenter/blob/master/dsegmenter/edseg/conll.py#L416
raise IndexError("cannot find index {:s}".format(name))
to
raise IndexError("cannot find index {:s}".format(str(name)))

Greetings, and if its already fixed in the newest Version, then my apologies.

Greetings.

Thank you for the bug report. I will have a look at it.

Fixed

Commit: master fb8ef92

Files touched:
modified: MANIFEST.in
modified: dsegmenter/edseg/chunking.py
modified: dsegmenter/edseg/conll.py
modified: dsegmenter/edseg/finitestateparsing.py
new file: pytest.ini
modified: requirements.txt
modified: setup.cfg
modified: setup.py
new file: test-requirements.txt
new file: tests/edseg/test_conll.py

Test case added to:
tests/edseg/test_conll.py

Current output:

(venv)
sidorenko@sidorenko>
DiscourseSegmenter>discourse_segmenter edseg TCgadSWx.txt > output

(venv)
sidorenko@sidorenko>
DiscourseSegmenter>echo $?
0

will be available on PyPi with the next release