Leading 0x00 stripped from binary data
Closed this issue · 12 comments
I met a leading \0
string when running tests, is this normal?
There was 1 failure:
1) Tuupola\Base62\Base62Test::testShouldEncodeAndDecodeRandomBytes
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-Binary String: 0x00486eea2de87439fc081b892616a3b0f1f098df86e2cdd23e7d21f5f046a30a1a6662fff6c3c017b1d4853a1fdd7dc00975016d9c2801b9df659fadc6abe1109b1e1f3960367603e75bb9ddf9d8097af5948f74df585d05bbee61aff992f3d35577e31aafce7d4342d3a68da0d5ca8d46bde2f7e7f555cf6a1938c4f52bdd43
+Binary String: 0x486eea2de87439fc081b892616a3b0f1f098df86e2cdd23e7d21f5f046a30a1a6662fff6c3c017b1d4853a1fdd7dc00975016d9c2801b9df659fadc6abe1109b1e1f3960367603e75bb9ddf9d8097af5948f74df585d05bbee61aff992f3d35577e31aafce7d4342d3a68da0d5ca8d46bde2f7e7f555cf6a1938c4f52bdd43
/private/tmp/base62/tests/Base62Test.php:41
FAILURES!
base64_encode
can handle this:
>>> $data = hex2bin("00313233");
=> "\0123"
>>> base64_encode($data)
=> "ADEyMw=="
>>> base64_decode("ADEyMw==")
=> "\0123"
>>> Tuupola\Base62Proxy::encode($data)
=> "DWjr"
>>> Tuupola\Base62Proxy::decode("DWjr")
=> "123"
This seems to be default gmp behaviour. Since it is an numerical conversion leading zeroes do not have a value.
print $encoded = gmp_strval(gmp_init("0x00deadbeef", 16), 62);
print $decoded = gmp_strval(gmp_init($encoded, 62), 16);
/*
44pZgF
deadbeef
*/
PhpEncoder can not handle either.
All encoders have the same output since they are interchangeable.
Then just forget it 😆
I keep this open for a while to think about it. For numerical conversions losing leading zeros is ok. For binary data losing leading 0x00
is kind of not ok because that is something one would not expect.
Yes, that is why I opened this issue.
I have no idea how to fix it, and I have thought about it. If we clearly know the length of the original data, such as UUID, we can pad the leading, so maybe we can save the length in the encoded data. But we have to give a big enough room for the length, this will cause the encoded data too long.
@tuupola I'd like to tell you I released a new package yesterday that ships with your Base62 package, thanks for your great job! 👍
Looks good!
Hey everyone, I found a hack-ish way to solve this problem inside of your project, in case any of you are still wanting a solution. By always pre-pending a non-null byte to the data, then discarding that byte upon decoding, you can maintain arbitrary data integrity.
use Tuupola\Base62Proxy as Base62;
$encoded = Base62::encode("\x01" . $somedata);
$decoded = Base62::decode(substr($encoded, 1));
Perhaps some kind of special arbitrary data mode functions like encodeBinary()
and decodeBinary()
could be added to the project for this scenario.
I keep this open for a while to think about it. For numerical conversions losing leading zeros is ok. For binary data losing leading 0x00 is kind of not ok because that is something one would not expect.
FWIW I see it the same way, because decoding an encoded payload should always yield the exact original data. Hence losing leading zeroes on integers is fine, but on strings it is not.