pinard/Pymacs

Build wrong python expression out of lisp expression with utf-8 encoded content in emacs buffer.

Closed this issue · 8 comments

In pymacs.el.in,pymacs defun pymacs-print-for-eval function to build python expression out of lisp expression,however,when using ropemacs,it will try to get emacs buffer content using (buffer-string),while the buffer content is encoded in utf-8,then the folloing code in pyemacs.el.in:

(when multibyte
(princ ".encode('ISO-8859-1').decode('UTF-8')")))

will lead an error in pyemacs' python side,because we try to encode an utf-8 encoded string。

It seams that we'd better to handle carefully by dectecting buffer charset?

Hi, Zhang (hoping I'm naming you correctly).

I would need more information (a precise recipe, a traceback, something) to study this problem. I have the impression you are giving me your interpretation of a problem, but not the problem itself. Or maybe I'm not understanding you fully?

Trying to encode an UTF-8 encoded string with ISO-8859-1 will never fail, and produce an exact equivalent of the original string in a way that decoding UTF-8 should then work.

About "carefully detecting buffer charset", this is no trivial task in most cases, with no guaranteed success ever. Some charsets could be detected with a relatively high degree of success, but never perfectly. We better have to know the charset beforehand if we want anything solid. In the case above, I guess (I did not thoroughly check before replying) that Pymacs knows this is UTF-8.

François

Hi again, Zhang. I found a few more bits about the problem you report, from the Rope mailing lists. I roughly copied below what I have here, for the posterity to read :-). Depiste many comments and details, I would still need more context, as too many things escape me in this conversation. Ideally, I would like to get a self-contained example I could use on my side to see the problem, explore it, and then see what could be done about it. Thanks!

On 2011-09-05, to Ali Gholami Rudi

Hello Ali,

Thank you for you quick response,but after adding more debugging information,I can confirm that my emacs buffer has been not narrowed. I also observed on the Pymacs buffer that 'lisp.buffer_string'(which is mapped to emacs lisp buffer-string) got the correct content(the whole buffer string). There must be pymacs's issue,which didnot translate emacs lisp returned string to python string properly. I will continue to analyse the code,hope I can find the answer.

On Mon, 05 Sep 2011 00:48:09 +0430 Ali Gholami Rudi aligrudi@gmail.com wrote:

"fortitude.zhang" fortitude.zhang@gmail.com wrote:

  1. open the unicode python source file(please see the following Footnote 1),and add the newline.

  2. after type code Session. and execute M-/ for code completion,I will got the error.

I found ropemacs's LispUtil's get_text function get wrong source code string when call lisp.buffer_string(),So I want know whether this is a bug in ropemacs ? and How Can I get a fix?

 def get_text(self):
     end = lisp.buffer_size() + 1
     old_min = lisp.point_min()
     old_max = lisp.point_max()
     narrowed = (old_min != 1 or old_max != end)
     if narrowed:
         lisp.narrow_to_region(1, lisp.buffer_size() + 1)
     try:
         lisp.message('called my to get_text,buffer_string len is {0},while buffer size is{1}'.format(len(lisp.buffer_string()), end))

I add this line for debug ropemacs,found that while buffer_size is 2407, but len(lisp.buffer_string()) is 275,which is quite less that 2407,while

I guess the large difference is due to narrowing (end is calculated too early); this may give something more meaningful:

 # ...
 try:
    lisp.message('len1=%d len2=%d' %
                 (len(lisp.buffer_string()), lisp.buffer_size() + 1))

I found in the previous post,somebody give a fix to detect a minimum value in len(source) and offset,but I am wonder whether it's the best solution.

It seems lisp.buffer_size() adds one to the size of the actual buffer (maybe eof newline?). If that's the case, I think that fix seems reasonable. Can you verify that abs(len1 - len2) <= 1? Does that patch still work?

Thanks, Ali

On 2011-09-14, to Ali Gholami Rudi

Hi Ali,

I have finnally found the reason. In ropemode's pyemacs.el,there is a fuction 'pymacs-print-for-eval' which print a python expression out of a lisp expression,when it process lisp multibyte string,it try to encode the buffer-string to ISO-8859-1 and then decode from utf-8,while the my file is already encoded by utf-8,so I ommit the ISO-8859-1 encoding process,and it now works well. And talking back to the problem,the buffer size is actullay python's exception string for error generated for ISO-8859-1 encoding,that's why the size is quite small than the file buffer. Thanks you very much for your help.

On Thu, 08 Sep 2011 20:56:47 +0430 Ali Gholami Rudi aligrudi@gmail.com wrote:

fortitude.zhang@gmail.com wrote:

Thank you for you quick response,but after adding more debugging information,I can confirm that my emacs buffer has been not narrowed. I also observed on the Pymacs buffer that 'lisp.buffer_string'(which is mapped to emacs lisp buffer-string) got the correct content(the whole buffer string).

Seems serious. It cannot be a byte vs. character offset problem (the difference is a factor of ten). Does "(buffer-size)" give a different value on the lisp side? If so, is there any other function that returns the actual size of current buffer?

Ali

Hi pinard,
Sorry for my late response,I tried to reproduce this problem and it appeared again,In Pymacs buffer I got this debug message:
>131 return "#\344\270\255\345\233\275\n\nimport os\n\ndef main():\n \"\"\"\"\"\"\n os.\n\n".encode('ISO-8859-1').decode('UTF-8')

this means emacs side will send this statement to python side, and the statement will be evaled by python.

However,as you can see,the sentence "#\344\270\255\345\233\275\n\nimport os\n\ndef main():\n \"\"\"\"\"\"\n os.\n\n".encode('ISO-8859-1').decode('UTF-8') will raise an error named UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128).

But if I modify it to "#\344\270\255\345\233\275\n\nimport os\n\ndef main():\n \"\"\"\"\"\"\n os.\n\n".decode('UTF-8'),python will get the corrected unicode string,then the error will be fixed.

So I wonder, when handling emacs multibyte buffer, the function pymacs-print-for-eval in pymacs.el should change the code (princ ".encode('ISO-8859-1').decode('UTF-8')"))) to (princ "decode('UTF-8')"))) to let python side correctly decode the utf-8 encoded string.

This problem has happened to me and I comment the lines below

(when multibyte
(princ ".encode('ISO-8859-1').decode('UTF-8')"))

and it works now.

The same issue has been troubling me for the whole week. I assume this problem will arise under following circumstances:

  1. Emacs runs under Win32 environment;
  2. The path of Python codes contains Chinese characters.
    I changed pymacs.el per fortitudezhang's fix (removed encode('ISO-8859-1')). It solved part of my problem. But my Emacs keeps throwing UnicodeEncodeError if the path contains Chinese characters:
pymacs-report-error: Python: Traceback (most recent call last):
  File "C:\Python27\Pymacs\Pymacs\pymacs.py", line 250, in loop
    value = eval(text)
  File "", line 1, in 
  File "c:\Python27\lib\site-packages\ropemode\decorators.py", line 53, in newfunc
    return func(*args, **kwds)
  File "c:\Python27\lib\site-packages\ropemode\interface.py", line 142, in goto_definition
    definition = self._base_definition_location()
  File "c:\Python27\lib\site-packages\ropemode\interface.py", line 157, in _base_definition_location
    self._check_project()
  File "c:\Python27\lib\site-packages\ropemode\interface.py", line 448, in _check_project
    self.open_project()
  File "c:\Python27\lib\site-packages\ropemode\decorators.py", line 53, in newfunc
    return func(*args, **kwds)
  File "c:\Python27\lib\site-packages\ropemode\interface.py", line 88, in open_project
    self.project = rope.base.project.Project(root)
  File "c:\Python27\lib\site-packages\rope\base\project.py", line 146, in __init__
    self._init_prefs(prefs)
  File "c:\Python27\lib\site-packages\rope\base\project.py", line 176, in _init_prefs
    execfile(config.real_path, run_globals)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 12-13: ordinal not in range(128)

I then put my codes under a path without any Chinese character and ropemacs works.

kunimi
Chinese characters(GBK or GB2312, not utf8) cannot be decoded from utf8 to unicode. You need to convert your file to utf8 encoding or change a new file path.

Let GB* go to hell.

You may try below if you insists on GB* encoding

(when multibyte
(princ ".decode('GB2312')"))

I encountered this problem on my gentoo system,so there is no GBK related stuff...

----- Reply message -----
发件人: "Brooklyn" reply@reply.github.com
收件人: "dongya zhang" fortitude.zhang@gmail.com
主题: [Pymacs] Build wrong python expression out of lisp expression with utf-8 encoded content in emacs buffer. (#7)
日期: 周五, 12 月 2 日, 2011 年 14:19

kunimi
Chinese characters(GBK or GB2312, not utf8) cannot be decoded from utf8 to unicode. You need to convert your file to utf8 encoding.

Let GB* go to hell.

You may try below if you insists on GB* encoding

(when multibyte
(princ ".decode('GB2312')"))


Reply to this email directly or view it on GitHub:
#7 (comment)

Just wanted to thank you all for the discussion, and patience!

François