gpcarmo/cwls-las-reader

las_reader doesn't handles the file enccoding

fxgallego opened this issue · 4 comments

Trying to load some las files raises an "invalid byte sequence in UTF-8" error.

Simply adding the encoding to the File.open parameters in https://github.com/gpcarmo/cwls-las-reader/blob/master/lib/las_reader.rb#L182 solves this problem.

I've used File.open(file_name, 'r:ISO-8859-1') and tough that's safe because the las file standard is an ASCII standard.

Makes sense. Is your testing environment MS Window based?
I test with an ASCII file from the Kansas University http://www.kgs.ku.edu/software/DEWL/HELP/pc_read/Shamar-1.las and could not reproduce the same error.

This file I am using to fix the issue #5

It's Windows as well as Linux. Further testing brought some insights. It appears to fail when the file is submitted trough a rails form. Works flawlessly when it's readed from the filesystem. Anyway, an option to handle the encoding passing it as parameter in load or new should be enough.

Yep. I agree passing the encoding as a optional parameter in load will work better. Could reproduce the error adding Extended ASCII characters in the same file (á,é,ó or ç) in a ISO-8859-15 file will break the code.

Looking at the LAS_20_Update_Jan2014 documentation (section 3.0):

The ASCII character set is limited to ASCII 13 (carriage return), ASCII 10 (line feed), and ASCII 32 to ASCII 126 inclusive. All other ASCII characters are not allowed and it is suggested that software readers convert them to a space...

Will keep the default environment encoding and will follow the suggestion above about changing unsupported characters by the space character. Unless it is handled by the optional encoding parameter.

Thanks a lot! I'll be updating to this commit and let you know if something doesn't works as expected.