h2non/jshashes

Wrong base64 encode of unicode chars

rjcoelho opened this issue · 1 comments

Base64 encode/decode of unicode chars first tries to convert to utf8, right idea but wrong implementation.

(new Hashes.Base64()).encode('张')
"5Q=="
(new Hashes.Base64()).setUTF8(false).encode(unescape(encodeURIComponent('张')))
"5byg"
window.btoa(unescape(encodeURIComponent('张')))
"5byg"

See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding#The_.22Unicode_Problem.22

Implementation wise utf8Encode() returns same as unescape(encodeURIComponent()) so the problem is elsewhere. See https://github.com/davidchambers/Base64.js/blob/master/base64.js for window.atob() shim. I think len shoud be after utf8 encode.

Also base 64 decode is wrong.

(new Hashes.Base64()).decode('5byg')
"o("
(new Hashes.Base64()).setUTF8(false).decode('5byg')
"o("
decodeURIComponent(escape(window.atob("5byg")))
"张"

I'll probably remove Base64 support in a future version since it's well-supported natively by JavaScript engines in the browser/node.js.

Alternatively you could use this implementation, which is more reliable:
https://gist.github.com/h2non/37a6d588271fc9c1e828