Codecs/bytes->str does not produce correct utf8 strings according to PostgreSQL

Question

Codecs/bytes->str does not produce correct utf8 strings according to PostgreSQL

Rovanion opened this issue 8 years ago · 5 comments

Passing the output of

(buddy.core.codecs/bytes->str (nonce/random-nonce 16)

to a field of type text in PostgreSQL results in the following error:

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00

Using bytes->hex works, but that seems like a non-optimal way to go about solving the issue.

It seems like every run of (buddy.core.codecs/bytes->hex (nonce/random-nonce 16)) starts with the same sequence 000001549b, or as emacs interprets (buddy.core.codecs/bytes->str (nonce/random-nonce 16)) "^@^@^A".

Answer 1 · 2016-05-10T20:02:17.000Z

What are you trying to do? The bytes->str function just converts a byte array of octets encoded in utf-8 convert to readable string. But random-nonce does not generate anything printable. It generates a cryptographicaly nonce. If you want to store it as string you need to use hex/base64 or something different encoding.

bytes->str is not a magical function that converts whatever byte array to "printable" string. It just makes the reverse conversion of to-bytes (that takes a string and return a array of bytes (octets) encoded in utf-8).

Answer 2 · 2016-05-10T20:03:22.000Z

If you want to store a nonce in postgresql you have two options, use a bytea field and just store the byte array, or encode that bytearray to some text representation (eg. hex or base64).

Answer 3 · 2016-05-10T21:01:08.000Z

What I wanted to do is to store a salt, which as I understood from the docs [0] should be generated by nonce, in a field by the hashed password. And I read "Converts byte array to string using UTF8 encoding" to mean that it produced a correct UTF8-string.

[0] https://funcool.github.io/buddy-core/latest/#nonces-and-salts

Answer 4 · 2016-05-10T21:21:03.000Z

First, salt and nonces are diferent, if you need a salt, please use random-bytes if you need a nonce use random-nonce.

About encoding, UTF-8 is one of the unicode encoding format that allow store unicode strings (java's String) in bytes. UTF-8 byte string (byte[]) can be converted to unicode string (java's String instance) only if the (byte[] contains properly encoded string using UTF-8 encoding) and in this case, random nonce does not contains properly encoded utf-8 byte string, it contains fully random data. I see, it is a little bit confusing but this is how the string stuff work.

Answer 5 · 2016-05-11T07:13:36.000Z

I'm terribly sorry for posting what essentially is a support issue caused by my poor understanding on your issue tracker. Thank you for your time!