unicode handling

Question

unicode handling

Closed this issue 12 years ago · 2 comments

I ran into unicode problems this morning. I need to pass a file with unicode chars (Chinese in my case), and Pymacs cannot handle it.

I apply this patch and it is working.

--- pymacs.el2  2012-02-23 04:28:52.000000000 +0800
+++ pymacs.el   2012-02-26 12:10:55.000000000 +0800
@@ -506,7 +506,7 @@
                                  (split-string (prin1-to-string text) "\n")
                                  "\\n"))
                (when multibyte
-                 (princ ".encode('ISO-8859-1').decode('UTF-8')")))
+                 (princ ".decode('UTF-8')")))
              (setq done t)))
           ((symbolp expression)
            (let ((name (symbol-name expression)))

i.e. it removes the .encode('ISO-8859-1') bit.

My question is why encode it with ISO-8859-1 when the multibyte text has been previously decoded as utf-8 in emacs like this:

(encode-coding-string expression 'utf-8)

Isn't this contradictory?

Answer 1 · 2012-03-26T03:37:39.000Z

Hi again, Leo.

I merged this one as well, thanks!

The truth is that I did not scrutinize the matter enough to be sure, so I decided to merely trust you on this one :-).

Encoding into ISO 8859-1 is a mere kludge for transforming bytes set within a Unicode string into the same bytes set within an str string, because as you surely know, Unicode and ISO 8859-1 coincide in the first 256 positions. Are there situations in which that transformation is needed? I'll postpone studying this more closely if you or other users submit concerns or problems in this area, once your patch has been applied.

François

Answer 2 · 2012-03-26T03:49:36.000Z

Saving a cross-reference with issue #7, where this problem has been duly reported and discussed already. I presume that the repetition of the problem and the similarity of the suggestion was a good incentive to accept the correction despite I do not understand all the implications. :-)