besser82/libxcrypt

Make generic code, hashing methods, and documentation agree re forbidden characters in hashes

Closed this issue · 7 comments

zackw commented

crypt.5 currently says

Hashed passphrases are always entirely printable ASCII, and do not contain any whitespace or the characters ‘:’, ‘;’, ‘*’, ‘!’, or ‘\’.

However, our generic code currently only enforces the absence of ‘:’ and newline. Prior to 2112d2b, there was no generic check at all. Prior to b2b813a, some but not all hashing methods implemented much more restrictive rules (allowing only their own base64 alphabet in the salt part of a setting); these were found to conflict with existing salts (see #105). The setting-parsing code in each hashing method is ad hoc and difficult to audit for what syntax it actually accepts; there might be other quirks hiding.

I think the rule as stated in the documentation is abstractly correct, and I would like to make the generic code enforce that rule and then remove the redundant checks in the hashing methods. However, I'm worried about this breaking existing hashes as the pre-b2b813adbd9d7cc7d20a9b984ef27b64df8c15d9 code did. I'm soliciting comments from the community.

Would the POSIX.1-2017 standard for the crypt() function be the best choice for characters? POSIX declares in
https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/crypt.html that only the following characters may be used in the salt:

The salt argument shall be a string of at least two bytes in length not including the null character chosen from the set:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 . / 

Basically there are several RFCs defining base64 encoding tables for different applications. And there is the POSIX base64 table for MCF.

If I compare them there are just a few different characters allowed in them: +-_,=, besides the POSIX MCF table.

Then we have two known ways to consume those salt characters by the hash methods:

  1. Just use them as a c-string for computation, e.g. sha{256,512}crypt and md5crypt.
  2. Translate them back to the original raw bytes and use those for computation, e.g. yescrypt.

Where the second way of processing obviously is the one that could be troublesome with characters outside of the MFC dialect.

zackw commented

I think it's going to have to be ok if specific hashing methods place more restrictions on their salt than the generic code does.

I think it's going to have to be ok if specific hashing methods place more restrictions on their salt than the generic code does.

+1

So I guess, we can safely allow the MCF dialect + +-_,= by the generic check, but need to ensure a stricter check for those methods, which translate the base64 salt back into raw bytes.

FWIW, my mkpasswd (from the whois package) mandates that salts comply with the POSIX.1-2017 list for all methods. I see that I added this check in version 4.5.7 in 2001 but I cannot remember why.

zackw commented

@rfc1036 The only potential problem I can see with passphrase-enrollment programs like mkpasswd limiting themselves to the POSIX.1-2017 alphabet, is that Argon2 uses the RFC 4648 base64 alphabet instead. libxcrypt's crypt_gensalt*() functions do something similar -- you always get salt limited to the same base64 alphabet that the hashing method would use for the hashed passphrase proper, whatever that is -- and that's been fine.

However, if libxcrypt imposes new restrictions on what the crypt*() functions accept, then we may break validation of existing hashed passphrases.

@besser82 Here's how I'm thinking about it: we know we need to allow these (ASCII-coded) bytes:

0 1 2 3 4 5 6 7 8 9
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
. / + - _ = , $

And we know that these bytes would definitely be troublesome as part of the second field of an entry in /etc/shadow:

00 01 02 03 04 05 06 07 08    0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
!  *  :  \  7F
80..FF

That leaves these bytes that might or might not be OK:

TAB SPC " # % & ' ( ) ; < > ? @ [ ] ^ ` { | } ~

In a greenfield system there would be no reason to allow any of these, but I don't have any data with which to assess the risk that blocking them generically would break someone's existing hashed passphrase. That's the question in my mind.

zackw commented

Unrelatedly, and for the record, I am not planning on working on any code components of libxcrypt until we have working CI again. ;-)