Chinese character encoding issues

Question

Chinese character encoding issues

chenjiashuo123 opened this issue a year ago · 2 comments

import jep.Interpreter;
import jep.JepConfig;
import jep.SubInterpreter;
import jep.MainInterpreter;
public class Main {
    public static void main(String[] args) throws Exception{
        String path = "D:\\data\\code\\log_project\\v1.3\\py_func_lib\\src\\main\\resources";
        MainInterpreter.setJepLibraryPath(path + "/" + "jep.dll");
        JepConfig config = new JepConfig();
        try (Interpreter interp = new SubInterpreter(config)) {
            interp.exec("#-*-coding:utf-8 -*-");
            interp.exec("print('中文')");

        }
    }
}

this program will output

How can I get correct Chinese character display

Answer 1 · 2023-11-05T17:20:41.000Z

Unfortunately your example correctly prints 中文 on my computer so I think the problem may be specific to your environment. For reference I am running Ubuntu 22.04, OpenJDK 11, Python 3.10.12 with Jep 4.2(pre-release). I also tried it on a docker container with python:3.12 with openJDK 17 and it worked correctly there also. I suspect this problem may be specific to windows and I am not familiar with locale settings in windows so I cannot be sure.

The only jep specific code here is Interpreter.exec() which is converting the Java String with print('中文') into a utf-8 encoded char* and passing it to Python. It seems to me the problem may be either

Python is not properly interpreting the char*
print is encoding it wrong
Your console is having trouble displaying it correctly.

You might try print(len('中文')). On my system that results in 2 which indicates it correctly understands it is only 2 characters even though it takes more than 2 bytes. That rules out item 1 because if python was not reading the code correctly I would expect that to be 4.

Answer 2 · 2023-11-06T01:18:01.000Z

I have try my code on a docker container with python:3.7 with openJDK 1.8 and it worked correctly. It might be an problem with my Windows environment.