Build wrong python expression out of lisp expression with utf-8 encoded content in emacs buffer.

Question

Build wrong python expression out of lisp expression with utf-8 encoded content in emacs buffer.

Closed this issue 12 years ago · 8 comments

In pymacs.el.in，pymacs defun pymacs-print-for-eval function to build python expression out of lisp expression，however，when using ropemacs，it will try to get emacs buffer content using (buffer-string)，while the buffer content is encoded in utf-8，then the folloing code in pyemacs.el.in：

(when multibyte
(princ ".encode('ISO-8859-1').decode('UTF-8')")))

will lead an error in pyemacs' python side，because we try to encode an utf-8 encoded string。

It seams that we'd better to handle carefully by dectecting buffer charset？

Answer 1 · 2011-10-20T15:25:41.000Z

Hi, Zhang (hoping I'm naming you correctly).

I would need more information (a precise recipe, a traceback, something) to study this problem. I have the impression you are giving me your interpretation of a problem, but not the problem itself. Or maybe I'm not understanding you fully?

Trying to encode an UTF-8 encoded string with ISO-8859-1 will never fail, and produce an exact equivalent of the original string in a way that decoding UTF-8 should then work.

About "carefully detecting buffer charset", this is no trivial task in most cases, with no guaranteed success ever. Some charsets could be detected with a relatively high degree of success, but never perfectly. We better have to know the charset beforehand if we want anything solid. In the case above, I guess (I did not thoroughly check before replying) that Pymacs knows this is UTF-8.

François

Answer 2 · 2011-10-21T01:06:46.000Z

Hi again, Zhang. I found a few more bits about the problem you report, from the Rope mailing lists. I roughly copied below what I have here, for the posterity to read :-). Depiste many comments and details, I would still need more context, as too many things escape me in this conversation. Ideally, I would like to get a self-contained example I could use on my side to see the problem, explore it, and then see what could be done about it. Thanks!

On 2011-09-05, to Ali Gholami Rudi

Hello Ali,

Thank you for you quick response,but after adding more debugging information,I can confirm that my emacs buffer has been not narrowed. I also observed on the Pymacs buffer that 'lisp.buffer_string'(which is mapped to emacs lisp buffer-string) got the correct content(the whole buffer string). There must be pymacs's issue,which didnot translate emacs lisp returned string to python string properly. I will continue to analyse the code,hope I can find the answer.

On Mon, 05 Sep 2011 00:48:09 +0430 Ali Gholami Rudi aligrudi@gmail.com wrote:

"fortitude.zhang" fortitude.zhang@gmail.com wrote:

open the unicode python source file(please see the following Footnote 1),and add the newline.

after type code Session. and execute M-/ for code completion,I will got the error.

I found ropemacs's LispUtil's get_text function get wrong source code string when call lisp.buffer_string(),So I want know whether this is a bug in ropemacs ? and How Can I get a fix?

 def get_text(self):
     end = lisp.buffer_size() + 1
     old_min = lisp.point_min()
     old_max = lisp.point_max()
     narrowed = (old_min != 1 or old_max != end)
     if narrowed:
         lisp.narrow_to_region(1, lisp.buffer_size() + 1)
     try:
         lisp.message('called my to get_text,buffer_string len is {0},while buffer size is{1}'.format(len(lisp.buffer_string()), end))

I add this line for debug ropemacs,found that while buffer_size is 2407, but len(lisp.buffer_string()) is 275,which is quite less that 2407,while

I guess the large difference is due to narrowing (end is calculated too early); this may give something more meaningful:

 # ...
 try:
    lisp.message('len1=%d len2=%d' %
                 (len(lisp.buffer_string()), lisp.buffer_size() + 1))

I found in the previous post,somebody give a fix to detect a minimum value in len(source) and offset,but I am wonder whether it's the best solution.

It seems lisp.buffer_size() adds one to the size of the actual buffer (maybe eof newline?). If that's the case, I think that fix seems reasonable. Can you verify that abs(len1 - len2) <= 1? Does that patch still work?

Thanks, Ali

On 2011-09-14, to Ali Gholami Rudi

Hi Ali,

I have finnally found the reason. In ropemode's pyemacs.el,there is a fuction 'pymacs-print-for-eval' which print a python expression out of a lisp expression,when it process lisp multibyte string,it try to encode the buffer-string to ISO-8859-1 and then decode from utf-8,while the my file is already encoded by utf-8,so I ommit the ISO-8859-1 encoding process,and it now works well. And talking back to the problem,the buffer size is actullay python's exception string for error generated for ISO-8859-1 encoding,that's why the size is quite small than the file buffer. Thanks you very much for your help.

On Thu, 08 Sep 2011 20:56:47 +0430 Ali Gholami Rudi aligrudi@gmail.com wrote:

fortitude.zhang@gmail.com wrote:

Thank you for you quick response,but after adding more debugging information,I can confirm that my emacs buffer has been not narrowed. I also observed on the Pymacs buffer that 'lisp.buffer_string'(which is mapped to emacs lisp buffer-string) got the correct content(the whole buffer string).

Seems serious. It cannot be a byte vs. character offset problem (the difference is a factor of ten). Does "(buffer-size)" give a different value on the lisp side? If so, is there any other function that returns the actual size of current buffer?

Ali

Answer 3 · 2011-10-31T14:43:03.000Z

Hi pinard,
Sorry for my late response,I tried to reproduce this problem and it appeared again，In Pymacs buffer I got this debug message：
>131 return "#\344\270\255\345\233\275\n\nimport os\n\ndef main():\n \"\"\"\"\"\"\n os.\n\n".encode('ISO-8859-1').decode('UTF-8')

this means emacs side will send this statement to python side, and the statement will be evaled by python.

However,as you can see,the sentence "#\344\270\255\345\233\275\n\nimport os\n\ndef main():\n \"\"\"\"\"\"\n os.\n\n".encode('ISO-8859-1').decode('UTF-8') will raise an error named UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128).

But if I modify it to "#\344\270\255\345\233\275\n\nimport os\n\ndef main():\n \"\"\"\"\"\"\n os.\n\n".decode('UTF-8'),python will get the corrected unicode string,then the error will be fixed.

So I wonder, when handling emacs multibyte buffer, the function pymacs-print-for-eval in pymacs.el should change the code (princ ".encode('ISO-8859-1').decode('UTF-8')"))) to (princ "decode('UTF-8')"))) to let python side correctly decode the utf-8 encoded string.

Answer 4 · 2011-12-01T14:05:31.000Z

This problem has happened to me and I comment the lines below

(when multibyte
(princ ".encode('ISO-8859-1').decode('UTF-8')"))

and it works now.

Answer 5 · 2011-12-02T06:01:50.000Z

The same issue has been troubling me for the whole week. I assume this problem will arise under following circumstances:

Emacs runs under Win32 environment;
The path of Python codes contains Chinese characters.
I changed pymacs.el per fortitudezhang's fix (removed encode('ISO-8859-1')). It solved part of my problem. But my Emacs keeps throwing UnicodeEncodeError if the path contains Chinese characters:

pymacs-report-error: Python: Traceback (most recent call last):
  File "C:\Python27\Pymacs\Pymacs\pymacs.py", line 250, in loop
    value = eval(text)
  File "", line 1, in 
  File "c:\Python27\lib\site-packages\ropemode\decorators.py", line 53, in newfunc
    return func(*args, **kwds)
  File "c:\Python27\lib\site-packages\ropemode\interface.py", line 142, in goto_definition
    definition = self._base_definition_location()
  File "c:\Python27\lib\site-packages\ropemode\interface.py", line 157, in _base_definition_location
    self._check_project()
  File "c:\Python27\lib\site-packages\ropemode\interface.py", line 448, in _check_project
    self.open_project()
  File "c:\Python27\lib\site-packages\ropemode\decorators.py", line 53, in newfunc
    return func(*args, **kwds)
  File "c:\Python27\lib\site-packages\ropemode\interface.py", line 88, in open_project
    self.project = rope.base.project.Project(root)
  File "c:\Python27\lib\site-packages\rope\base\project.py", line 146, in __init__
    self._init_prefs(prefs)
  File "c:\Python27\lib\site-packages\rope\base\project.py", line 176, in _init_prefs
    execfile(config.real_path, run_globals)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 12-13: ordinal not in range(128)

I then put my codes under a path without any Chinese character and ropemacs works.

Answer 6 · 2011-12-02T06:19:13.000Z

kunimi
Chinese characters(GBK or GB2312, not utf8) cannot be decoded from utf8 to unicode. You need to convert your file to utf8 encoding or change a new file path.

Let GB* go to hell.

You may try below if you insists on GB* encoding

(when multibyte
(princ ".decode('GB2312')"))

Answer 7 · 2011-12-02T10:41:29.000Z

I encountered this problem on my gentoo system,so there is no GBK related stuff...

----- Reply message -----
发件人： "Brooklyn" reply@reply.github.com
收件人： "dongya zhang" fortitude.zhang@gmail.com
主题： [Pymacs] Build wrong python expression out of lisp expression with utf-8 encoded content in emacs buffer. (#7)
日期：周五, 12 月 2 日, 2011 年 14:19

kunimi
Chinese characters(GBK or GB2312, not utf8) cannot be decoded from utf8 to unicode. You need to convert your file to utf8 encoding.

Let GB* go to hell.

You may try below if you insists on GB* encoding

(when multibyte
(princ ".decode('GB2312')"))

Reply to this email directly or view it on GitHub:
#7 (comment)

Answer 8 · 2012-03-26T03:53:07.000Z

Just wanted to thank you all for the discussion, and patience!

François