iCyberon/pg_hashids

Inconsistent results when calling id_encode multiple times in same query

emrul opened this issue · 9 comments

emrul commented

Hi,

The following SQL does not work as expected

SELECT id_encode(table_a.contractid, 'my salt')   AS a_encoded,
       table_a.contractid as a_unencoded,
       id_encode(table_b.id, 'my salt') AS b_encoded,
       table_b.id AS b_unencoded
FROM table_a, table_b

and returns:
a_encoded LJqaROLN8GPb�
a_unencoded 1989354300773498885
b_encoded LJqaROLN8GPbe
b_unencoded 1989354300773498885

both input values (big ints) are the same but notice that one result has an additional character at the end. And yes, I think the first result (a_encoded) is perhaps a null character?

The problem seems worse if I don't use a salt.

The problem does not arise if I only have one id_encode call in a query.

I'm currently using Postgres 11.1 and have just upgraded to the latest GH version of pg_hashids to confirm the problem.

I see this as well, at least using the psql command line:

select
    id_encode(12345, 'mysalt', 7),
    id_encode(12345, 'mysalt', 7)
;

 id_encode | id_encode 
-----------+-----------
 n4aL\x08  | n4aLVMg
(1 row)

I'm using PostgreSQL 11.2.
It always seems to be when I have the id_encode in the first column of my result set. I have observed it happening even when it there is only one id_encode() in the query, although I haven't come up with a good example for that yet.

Just to confirm it wasn't a psql issue specifically, I connected with Omnidb and ran the above query again.

And when I repeat the column 5 times, instead of just twice, all of them are corrupted except the last one.

Just FYI, I started debugging the issue, will post updates here.

Any updates?
I'm having the same issue.

Psql 10.4
Hashid version 1.2.1

Will this issue be addressed?
I think it is a critical bug.
@iCyberon

I'm seeing this as well (in PostgreSQL 12-beta3 on alpine), with an encoded values of "W6Kp\u0010", "23oZ\c0010", "aM9n\u0010", and "1vdz\u0010". Notably, it's always the last character, which is always \u0010. (But @rotten, above, was seeing \u0008, so maybe the other arguments to id_encode matter.) Also, these are not only the first columns, but also the first rows in their respective tables.

I think the title of the issue is misleading: it's not "multiple times in same query" as much as "an id_encode() value in the first column will sometimes have the wrong final character", which to me feels like something is overwriting the end of the string in some cases. (Although again, @rotten was getting the bogus values in "all but the last one".) I'm taking a look at the code, but I don't know anything about PostgreSQL internals, so I'm unlikely to find anything quickly.

Update: I've seen the same thing as @rotten, where calling id_encode() multiple times (consecutively, in my case) results in all-but-the-last being corrupted. My gut says this means that subsequent calls are stomping on the tail-end of previous ones, but it'll take more investigation.

Some more diagnostics... running manual select commands via pgAdmin that look like this:

select
    id_encode(0, 'salt', 5),
    id_encode(1, 'salt', 5),
    id_encode(2, 'salt', 5),
    id_encode(3, 'salt', 5),
    id_encode(4, 'salt', 5),
    id_encode(5, 'salt', 5);

I've found that the value and salt don't seem to matter at all, but the length (third argument) definitely does. The corrupted strings repro very reliably in all-but-the-last columns with lengths 5-8, 13-16, and 29-32. Lengths in the gaps (9-12, 17-28) seem to work without fail. That the problem appears in the 4 lengths leading up to a power of 2 seems significant. Sure enough, lengths of 60 and 65 are fine, but 61-64 all show the same problem.

I don't know whether this is a flaw in the hashids algorithm itself, or in the PostgreSQL wrapper. (Odds are the latter, as the core hashids implementation is used elsewhere.) Fingers crossed, I'm going to try digging into the code.

Just as a note - I can confirm that the fix resolved the issues I was having. Thank You!!