the-lambda-church/coquille

Python error when using UTF8 characters

dixego opened this issue · 2 comments

It seems that coquille can't parse UTF8 characters (or maybe non-ascii characters?); when trying to use a file with spanish accented characters I get the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/dixego/.vim/bundle/coquille/autoload/coquille.py", line 129, in coq_next
    send_until_fail()
  File "/home/dixego/.vim/bundle/coquille/autoload/coquille.py", line 321, in send_until_fail
    response = CT.advance(message, encoding)
  File "/home/dixego/.vim/bundle/coquille/autoload/coqtop.py", line 247, in advance
    r = call('Add', ((cmd, -1), (cur_state(), True)), encoding)
  File "/home/dixego/.vim/bundle/coquille/autoload/coqtop.py", line 195, in call
    msg = ET.tostring(xml, encoding)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1126, in tostring
    ElementTree(element).write(file, encoding, method=method)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

however if I remove all accented characters there's no problem anymore. I should note that all accented characters were within comments, not as part of the code.

Here is a minimal working example in Coq:

Definition n := 0.
Notation "∀" := n.

It seems like Python's xml.etree.ElementTree.tostring does not handle Unicode correctly. Here is a minimal working example triggering the error in Python:

# This Python file uses the following encoding: utf-8
import xml.etree.ElementTree
element = xml.etree.ElementTree.Element("")
element.text = "∀"
print(xml.etree.ElementTree.tostring(element, "utf-8"))
xunam commented

I ran into the same problem. Apparently Python's xml.etree.ElementTree does handle Unicdoe correctly but Coquille fails to pur valid Unicode strings in the XML trees it builds. I made a simple pull request #57 that seems to fix the issue.