Chinese character encoding issues
chenjiashuo123 opened this issue · 2 comments
import jep.Interpreter;
import jep.JepConfig;
import jep.SubInterpreter;
import jep.MainInterpreter;
public class Main {
public static void main(String[] args) throws Exception{
String path = "D:\\data\\code\\log_project\\v1.3\\py_func_lib\\src\\main\\resources";
MainInterpreter.setJepLibraryPath(path + "/" + "jep.dll");
JepConfig config = new JepConfig();
try (Interpreter interp = new SubInterpreter(config)) {
interp.exec("#-*-coding:utf-8 -*-");
interp.exec("print('中文')");
}
}
}
How can I get correct Chinese character display
Unfortunately your example correctly prints 中文
on my computer so I think the problem may be specific to your environment. For reference I am running Ubuntu 22.04, OpenJDK 11, Python 3.10.12 with Jep 4.2(pre-release). I also tried it on a docker container with python:3.12 with openJDK 17 and it worked correctly there also. I suspect this problem may be specific to windows and I am not familiar with locale settings in windows so I cannot be sure.
The only jep specific code here is Interpreter.exec()
which is converting the Java String with print('中文')
into a utf-8 encoded char* and passing it to Python. It seems to me the problem may be either
- Python is not properly interpreting the char*
- print is encoding it wrong
- Your console is having trouble displaying it correctly.
You might try print(len('中文'))
. On my system that results in 2 which indicates it correctly understands it is only 2 characters even though it takes more than 2 bytes. That rules out item 1 because if python was not reading the code correctly I would expect that to be 4.
I have try my code on a docker container with python:3.7 with openJDK 1.8 and it worked correctly. It might be an problem with my Windows environment.