sqids/sqids-php

More letters than set - thoughts?

joviczarko opened this issue · 4 comments

Hi guys,

I've been testing both Hashids and Sqids these days. And I have a bit of a problem with Sqids.
My alphabet with Sqids consists of shuffled small-caps alphabet and numbers. And regardless of shuffling when set to 6 characters it always start producting 7 char id from the same position... And it's just so low number to look like it run out of combinations (35936).

slika

This is not the case for Hashids, which easily generated ids for more than a 1M ids, so even the last one have 6 chars.

What is the deal here? Is the Hashids have much higher chance of creating duplicates or something else?
Can I force Sqids to create ids strictly to 6 chars?

Can I force Sqids to create ids strictly to 6 chars?

Up to a limit of course, which looks like you might be hitting. I'm assuming you're using a minLength parameter of 6 which promises to pad ids to at least the length you specify (not at most).

What is the deal here?

Shorter alphabet will produce longer ids. And of course setting min length will make them longer than they are. With default constructor parameters, my encoding of [35937] is only 4 chars long.

Is the Hashids have much higher chance of creating duplicates or something else?

No duplicates in either Hashids or Sqids.


Edit: Also worth mentioning, encoding for Sqids is totally different from Hashids, because the design is different (custom blocklist handling, padding logic different, separator logic different, etc). So to answer your question, yes it does look like it ran out of combinations at that point for your specific set of constructor parameters.

Thanks for your time and detailed answer.

I am using alphabet that consist of all english small caps + numbers, total of 36 elements.
If I make permutations of 6 chars out of 36 elements it give 1.5 billion possibilities.
I was a bit surprised it couldn't even get to 40K.

This means, in my case, Sqids can utilize only 0.0024% out of all possible permutations which is a lot of waste when you want url to stay as short as possible.

I assume hashing have it's own specific way of working and maybe my thinking is a bit flawed, but after all we are using it for creating IDs, not primarily for hashing.

I was very happy to see project evolved to Sqids, but I will have to use Hashids, since it creates uncomparably more id's with 6 char limit.

Fair enough, good calculations.

when you want url to stay as short as possible

If your use-case is to create the most amount of ids 6-chars long, I'm not even sure you need Sqids/Hashids. You can just use the decimal to hexadecimal method, but with your custom alphabet. And pad to 6 chars by manually adding a large enough number before encoding (and do the reverse after decoding).

Hmm... I will think about that method.

There are some use cases where you need exact the amount of chars. In my case is because my project relies on QR codes. And including domain and prefix 6 chars are a sweet spot so QR code doesn't grow into the next size category, making it for a class harder to be read by devices.

Sometimes, users will have to type that ID, thus I am using only small caps alphabet, and I am expecting 6 chars for form validation. I am also blocking any URL request that are outside of that char range so I do not even run a database query to check for data.

Anyhow, thank you for your time and help, it's very interesting topic.