amosnier/sha-2

Closing with total length may produce incorrect value on smaller systems(?)

Closed this issue ยท 8 comments

I haven't implemented SHA-256 myself, but if I understand your code and the spec correctly, the digest is closed with (among others) encoding the total length in bits. However, IIUC, the size of size_t may be dependent on system architecture. So, size_t could be 32-bits in size on smaller systems.

See total_len in struct Sha_256 and length-encoding in sha_256_close.

(Also, as side-note: I think you need to #include <stddef.h> in sha-256.h for type size_t.)

Hi,

Thanks for reporting this. Starting with the easy part:

(Also, as side-note: I think you need to #include <stddef.h> in sha-256.h for type size_t.)

No I do not, as can been seen in the C99 standard:

string_h

I haven't implemented SHA-256 myself, but if I understand your code and the spec correctly, the digest is closed with (among others) encoding the total length in bits. However, IIUC, the size of size_t may be dependent on system architecture. So, size_t could be 32-bits in size on smaller systems.

See total_len in struct Sha_256 and length-encoding in sha_256_close.

You definitely have a point. This implementation has a few more or less explicit limitations, compared with the official SHA-256 specification:

  • SHA-256 is able to calculate a hash for any array of bits up to a certain length, but this implementation, for obvious reasons, has no support for bit array lengths that are not multiples of eight (i.e. it only works with whole octets).
  • SHA-256 has support for bit array lengths of up to (2^64 - 1), but for this implementation, at the time of writing, the limit is always a number of octets which is up to SIZE_MAX. Please note that sizeof (size_t) needs not even be 4 (32 bits), since the C-standard only guarantees that SIZE_MAX is at least 65535 (16 bits). For calc_sha_256, the SIZE_MAX limit makes perfect sense, since the machine is not supposed to be able to store larger arrays anyway. But for the newer streaming interface, this limit is a little nasty, because it is not really obvious, since client code is free to invoke sha_256_write as many times as it wants before invoking sha_256_close. However, it should be noted that while size_t is guaranteed to be available, uint64_t is optional! If available, using uint64_t could also be less performant than using size_t for the same calculation, although that is unlikely to have an impact here. All in all, that limitation needs to at least be documented, and removing it by using uint64_t for the total byte length instead of size_t probably has more benefits than drawbacks for this implementation. It should also be noted that if we switch to uint64_t the total length limitation will be that of SHA-256, which is 64 bits for the bit array length! Beyond that, the results are unpredictable. That limitation is already clearly documented, however.

Thanks for clarifying. This is my limited knowledge of C showing. :-)

You definitely have a point. This implementation has a few more or less explicit limitations, compared with the official SHA-256 specification:

To clarify, I understand there are limitations by design. That's fine. I am was arguing for a bug in the current implementation.

My concern is that, given compilation on a more limited platform, your array indices go out-of-bounds. In sha_256_close, your code assumes a 64-bit/8-byte data-type for size_t (with indices going up to 7) which is not always true. You'll need to use an 8-byte sized data-type (or another way of simulating a total of 8 bytes) when you write out the hash-message length as per specification.

I misread. I checked again, and noticed you start writing with least-significant bytes, so excessive shifting would only produce more zero-values, i.e. zero-values for most-significant bits. So, that is where my reasoning failed. Thanks for following up.

@cobratbq, thank you for double-checking your arguments. However, mine still hold: you still had a point and I still want to somehow handle the issue, probably by simply changing size_t to uint64_t i two locations. That's why I had not closed the issue, and that's why I had self-assigned it. Reopening, now.

Quick check: you changed type to uint64_t. Does your code already take into account that the size is encoded in bits, but total_len accounts in bytes?

I think total_len may now hold values up to 2**64. When length is encoded, it is shifted left by 3 to acquire the bit-count. Just to check .. does the code warn/fail when total_len >= 2**61? (To clarify, it is not of immediate concern to me. I just want to check with you if your change does what you expected.)

Also, you mentioned uint64_t is not a suitable type for portability?

Quick check: you changed type to uint64_t. Does your code already take into account that the size is encoded in bits, but total_len accounts in bytes?

Yes, it does. Otherwise, it would not pass the extensive testing.

I think total_len may now hold values up to 2**64. When length is encoded, it is shifted left by 3 to acquire the bit-count. Just to check .. does the code warn/fail when total_len >= 2**61? (To clarify, it is not of immediate concern to me. I just want to check with you if your change does what you expected.)

How do you mean that it would "warn"? Knowing how client code wants to handle errors is impossible, so this library just performs the calculation to the best of its ability, which hopefully guarantees that the result will be correct for all possible legal SHA-256 inputs. For illegal inputs, since unsigned integers wrap around instead of overflowing in C, the algorithm will in fact produce a result, although that result probably will be meaningless. But then again, SHA-256 is not defined in such cases, so we could argue that this library's result in such a case is "correct" too.

What really applies in such a case is the note for sha_256_write:

 * @note This function may be invoked an arbitrary number of times between initialization and closing, but the maximum
 * data length is limited by the SHA-256 algorithm: the total number of bits (i.e. the total number of bytes times
 * eight) must be representable by a 64-bit unsigned integer. While that is not a practical limitation, the results are
 * unpredictable if that limit is exceeded.

Also, you mentioned uint64_t is not a suitable type for portability?

It's optional in C99 but so is uint32_t. So here, we are assuming that all practical client code will have access to both. That's what I meant by "more benefits than drawbacks".

To clarify/summarize: total_len can store sizes up to 2^64 bytes. But when creating the final digest, the length is encoded in bits, so the 3 highest bits are silently dropped.

Essentially, you say that this is by definition of sha-256 and you don't error out if the user goes beyond the specification. It makes sense. I wanted to check if that was considered.