Hex.toString()/fromString is not 100% reversible
Opened this issue · 0 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
1. Take any hex value from 80 to FF
2. Convert to string
3. Convert back to hex
What is the expected output? What do you see instead?
I would expect the resulting hex to be the same as the input. Instead the
output is 'c2' + input. This is because the hex values 80-FF have no meaning in
utf8 encoding and so those bytes are converted to meaningful utf-8 multibytes.
What version of the product are you using? On what operating system?
Version 1.3. Mac OS 10.6.5
Please provide any additional information below.
If instead of using read/writeUTFBytes you used read/writeMultiByte with
iso-8859-1 encoding in the Hex.to/fromString() then the functions would be
reversible.
For utf-8 information see http://www.fileformat.info/info/unicode/utf8.htm:
"The value of each individual byte indicates its UTF-8 function, as follows:
* 00 to 7F hex (0 to 127): first and only byte of a sequence.
* 80 to BF hex (128 to 191): continuing byte in a multi-byte sequence.
* C2 to DF hex (194 to 223): first byte of a two-byte sequence.
* E0 to EF hex (224 to 239): first byte of a three-byte sequence."
Original issue reported on code.google.com by eggers.p...@gmail.com
on 14 Jan 2011 at 10:01