multiformats/multicodec

missing several variations of sha2

Opened this issue ยท 3 comments

The multiformats table seems to be missing several variations of sha2. Specifically:

  • sha2-224
  • sha2-384
  • sha2-512-224
  • sha2-512-256

See the excellent comparison chart in Wikipedia for more details here: https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions

Or I've lifted just the most relevant rows in a screenshot here for ease of reference:

Comparison of SHA2 functions

I'm especially interested in getting sha2-384 into the table, because it's both widely available and offers some innate defenses against length extension attacks (unlike the sha2 variants we currently have in the table, which both offer none!). But all four of these are well-standardized and deserve an indicator number in the multiformats table.


Naming notes:

  • Most documents seem to refer to these as e.g. "sha-224" rather than "sha2-224". However, most of the multiformat table seems to prefix things with the family name, so I suppose we should say "sha2-224" to continue that pattern.
  • Most documents seem to refer to the latter two as e.g. "sha-512/224" (with a slash). However, most of the multiformat table seems to prefer dashes for separations, so I suppose we should say "sha2-512-224" to continue (both) patterns.

Language library availability notes:

In golang, all of these are available in the standard library (as sha256.New224, sha512.New384, sha512.New512_224, and sha512.New512_256 respectively).

In javascript, in the browser, taking the Mozilla docs for SubtleCrypto.digest as a reference, SHA-384 is widely available.


I'll file a PR for this soon, but first, a couple questions:

  • Four new numbers are needed for this. Is it preferred to look for the first range where I can find four new consecutive numbers? Or should I stuff two in the first open range, and the other two later?
  • Am I doing the naming conventions correctly?

Some additional clarifications that @aschmahmann pointed out:

  • Are these just truncations of the other existing functions?
    • No. These are not trivially constructable in most scenarios. (You cannot take a black-box of a sha2-512 function and -- taking just the output, without reaching into the hash function's guts -- chop it down into a sha2-384 output. There's more that varies than just the output size.)

https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf contains the specifications for all of these functions. I found Section 5.3 particularly illuminating on the distinctness of internal constants for these hashes.

rvagg commented

๐Ÿ‘ these are all fine by me, with the names as you've listed them which would match conventions already established. Re your naming notes:

  • Yes the whole "SHA2" thing is a mess but I think we're best served by sticking to having the 2 in the name. "SHA-256" suffers from the same thing as you're pointing to here, but it's really SHA2-256. Less ambiguity = better here I think, regardless of common parlance in this particular instance (while acknowledging that sometimes common parlance is important to pay attention to!).
  • The form with the / can go in the last column, so it's searchable and adds clarity. We have output lengths in a bunch of other hash functions, like blake2 and skein and we've added the -length on the end for those.

Re numbering: probably lower down in the numbering because I doubt these are going to be as common as 256 and plain 512 (not sure about 384, maybe the jury is still out on that?). I started 0x1012 for the weird padded SHA2-256 that Filecoin uses, trying to echo the 0x12 of SHA2-256. That section might be good to extend here, put them in straight after that?

Agree on SHA2 naming being something of a mess in the common parlance. It confuses me enough that I re-check the wikipedia entry every time I talk about this subject, to make sure I'm not going insane. ๐Ÿ™ƒ

I'll propose a diff with this naming and the variations going in the last column for clarity and search. ๐Ÿ‘ ๐Ÿ‘


I'm definitely going to campaign hard for SHA-384 to get a low, single-byte number. Per the combination of the language availability notes and the comparison table above: SHA-384 is the only hash function that's widely available in the browser today while also having some modicum of built-in resilience to length-extension attacks.

I would go so far as to say we should make SHA-384 the default hash function we use in any of our example material. (n.b., "default in example material", not default in the sense of what libraries do with no inputs; most of our libraries don't have that opinionated of defaults.)