lune-org/lune

serde library should have hash functions

Closed this issue ยท 12 comments

The serde library should include some general hashing functions.

Some possible hash functions that could be implemented are

  • md5
  • SHA-1
  • SHA-256
  • SHA-512
  • PBKDF2
  • Argon2
  • HMAC

Ideally, there'd probably be more. This is just what I came up with. I'd imagine that all of these have Rust implementations available?

Given several of these operate with 64 bit integers, it's non-trivial to implement them in pure Luau. It'd be easier and significantly faster to do on the Rust side.

For reference, here's a SHA-512 implementation I made. It's likely a significant portion of people don't have the knowledge or quite frankly will power to implement these on their own just due to the sheer amount of bit math involved to treat two 32-bit integers as one 64-bit one.

This would be fairly simple to implement in Rust, there's a bunch of well-vetted implementations for most most cryptographic algorithms.

If hashing support does get added, here's a list of my recommendations, along with why:

  • SHA-2 and its flavors - NIST standard and widespread use
  • SHA-3 and its flavors - NIST standard
  • Blake3 - Very fast
  • Argon2id - Winner of Password Hashing Competition
  • PBKDF2 - Standard password hashing algorithm

Despite widespread use, including SHA-1 and MD5 as built-in hashes is irresponsible IMO since they're broken and easily implemented in Luau at acceptable speeds.

In regards to stuff like HMAC, I am of two minds. HMAC is hilariously easy to implement (I have an old implementation that is 43 lines long) so it's probably fine to leave out. On the other hand, it's also a standard so expecting people to implement it themselves is maybe weird!

I'd also like to comment on potential non-cryptographic functions. I think we should err on the side of caution and not touch them. Something like xxHash3 has its uses but it's really not secure, and including it is asking for people to just use it because it's fast. Blake3 is a great alternative for most use cases anyway.

I think before we go further with this, incidentally, we should settle two questions:

  1. Are we okay with putting this in the Serde library? It feels inappropriate to put it here but up to you @filiptibell.

  2. What does the API look like? The obvious API is just something like sha256(algo: "hash" | "algorithm" | "names", message: string) -> string but it's not very robust. I know that @0x5eal has been working on an implementation, but we should probably settle on an API before it gets too far.

I left some thoughts on the related PR (#122) and would like to leave a couple extra ones here.

  1. Hashing algorithms that are not cryptographic ones... I personally wouldn't mind having something like GxHash, but I definitely see where the concern is coming from. If we can make it painfully obvious that certain algos are not cryptographically secure I would be fine with including them, but would also like to know what everyone thinks
  2. HMAC would probably be one of the most common use cases (maybe the most common) and we could easily get some big optimization and friendliness wins by including it natively
  3. With mlua getting native modules support, it's probably not too far out for Lune to have it either. It's worth considering this and how it influences what should be prioritized in Lune's standard library / built-ins vs more niche use cases that could be implemented in native modules. We should still try to do the best we can to balance flexibility + familiarity + performance, though

All in all, I'm currently leaning towards an API that looks something like:

type Hasher = "hash" | "algorithm" | "names"

serde.hash(algo: Hasher, message: string): string

serde.hmac(algo: Hasher, secret: string, message: string): string

where if necessary for some specific hashing algorithm we could add additional arguments to the end after message, like we already do with stdio.prompt.

This does leave out the use case of updating a hasher multiple times for performance when there are several inputs. I don't think we should prioritize this use case, since it changes our entire API surface, and makes it much less friendly for every other use case. If we support the new Luau buffer type, performance will also be much less of an issue.

I have a rough implementation that uses basically the format above. It's straight forward, but still uses a macro to erase some boilerplate (quite frankly, nobody needs to see the same 3 lines repeated for every single hashing algorithm).

I'm not sure if buffer is actually supported by Mlua yet though, which might be a downer. Also, I have no idea what the impact on the binary size a bunch of hashing crates might have. I haven't compared the two yet. If it's too big, we may need to put it all behind a feature.

Buffer is actually supported by mlua, and lune 0.8.0 ships with buffer support.

Buffer is actually supported by mlua, and lune 0.8.0 ships with buffer support.

I'm not seeing references to it in the docs. The Value enum as an example doesn't seem to have a Buffer variant.

I'm not seeing references to it in the docs. The Value enum as an example doesn't seem to have a Buffer variant.

That's because buffers are of type Value::UserData(data), here's an example of how I deal with them for #148: buffer.rs.

Disappointing, but not surprising. Never mind then, thank you.

I have a rough implementation that uses basically the format above. It's straight forward, but still uses a macro to erase some boilerplate (quite frankly, nobody needs to see the same 3 lines repeated for every single hashing algorithm).

I'm not sure if buffer is actually supported by Mlua yet though, which might be a downer. Also, I have no idea what the impact on the binary size a bunch of hashing crates might have. I haven't compared the two yet. If it's too big, we may need to put it all behind a feature.

@Dekkonot thought I'd chime in with a small update here - you can accept buffers as rust function args very easily now in the latest mlua version. If you use BString as a function argument it will accept both luau strings and buffers and the bytes are easily accessible

I'd greatly appreciate support for this

my use case is caching raw binary file data with hashes,

i have an automatic image compression program and i'd like to cache whatever images already compressed as a Hash