UniversalDependencies/tools

TypeError: "validate.py" compatibility with Python 3.6

logan-siyao-peng opened this issue · 6 comments

While using "validate.py" in UDtools to validate conllu-formatted GUM corpus, the following error message occurs to my computer:

"""
C:\Users\logan\Dropbox\GUM\UDtools>python validate.py --lang en ..\amir_gumdev_build\target\dep\ud\GUM_interview_ants.conllu
Traceback (most recent call last):
File "validate.py", line 735, in
validate(inp,out,args,tagsets,known_sent_ids)
File "validate.py", line 613, in validate
for comments,tree in trees(inp,tag_sets,args):
File "validate.py", line 74, in trees
for line_counter, line in enumerate(inp):
File "C:\Program Files\Python36\lib\codecs.py", line 644, in next
line = self.readline()
File "C:\Program Files\Python36\lib\codecs.py", line 557, in readline
data = self.read(readsize, firstline=True)
File "C:\Program Files\Python36\lib\codecs.py", line 499, in read
data = self.bytebuffer + newdata
TypeError: can't concat str to bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "validate.py", line 737, in
warn(u"Exception caught!",u"Format")
File "validate.py", line 49, in warn
print >> sys.stderr, (u"[%sLine %d]: %s"%(fn,curr_line,msg)).encode(args.err_enc)
TypeError: unsupported operand type(s) for >>: 'builtin_function_or_method' and '_io.TextIOWrapper'. Did you mean "print(, file=<output_stream>)"?
"""

I was using Python 3,6,4 with Windows 10 Home. The "conllu" file seems correctly-formatted. Interestingly, my adviser @amir-zeldes did not encounter this problem while running the same command on his computer.

I tried my luck with Python 2.7.14 on my own and it also ran through the document successfully. Not sure why it fails with bytes/strings issue.

Thank you for your support.

Best,
Logan

validate.py is a Python2 program. The error you get is because Python3 interprets print >> sys.stderr as a bit shift operator applied to a function and a file, and (admittedly rightfully) complains. I will look into porting the code into Python3.

I have also encountered this problem. After automatic conversion with 2to3 I obtain this during execution:

b'[Line 0]: Exception caught!'
Traceback (most recent call last):
File "validate.py", line 787, in <module>
validate(inp,out,args,tagsets,known_sent_ids)
File "validate.py", line 650, in validate
for comments,tree in trees(inp,tag_sets,args):
File "validate.py", line 77, in trees
for line_counter, line in enumerate(inp):
File "/usr/lib/python3.5/codecs.py", line 642, in next
line = self.readline()
File "/usr/lib/python3.5/codecs.py", line 555, in readline
data = self.read(readsize, firstline=True)
File "/usr/lib/python3.5/codecs.py", line 497, in read
data = self.bytebuffer + newdata
TypeError: can't concat bytes to str

The python 2.7 version runs well after installing the regex package.
Is there any available official python 3 version now?

No , Python 2.7 is the one that I use

@Stormur: you should be able install the regex package with pip2 install --user regex.
That said, it would be nice if validate.py supported both Python 2 and 3.

@fginter porting to Python 3 can be other issue , right ?

@martinpopel : Yes, I did it, my fault.

Still, I think that a python 3 version is needed, also because it can better handle Unicode.