dataspread/dataspread-web

UTF-8 encoding support (e.g. for Linux)

shichuzhu opened this issue · 0 comments

The issue was noticed when importing the airbnb.csv file (which contains non-ascii char). On windows / Mac OS no issue was happening, but on a Ubuntu machine the non-ascii error results in error.

The guess is dynamically-sized UTF-8 encoding causing the issue. A typical corresponding in the code is in ROM_Model.java line 530:
Instead of sending bytes based on length of the String,
cpIN.writeToCopy(sb.toString().getBytes(), 0, sb.length());
we should send bytes based on the length of the exact byte sequence
byte[] sbInBytes = sb.toString().getBytes();
cpIN.writeToCopy(sbInBytes, 0, sbInBytes.length);

Not sure if there exists any other part needing the similar patch