Closing with total length may produce incorrect value on smaller systems(?)
Closed this issue ยท 8 comments
I haven't implemented SHA-256 myself, but if I understand your code and the spec correctly, the digest is closed with (among others) encoding the total length in bits. However, IIUC, the size of size_t
may be dependent on system architecture. So, size_t
could be 32-bits in size on smaller systems.
See total_len
in struct Sha_256
and length-encoding in sha_256_close
.
(Also, as side-note: I think you need to #include <stddef.h>
in sha-256.h
for type size_t
.)
I haven't implemented SHA-256 myself, but if I understand your code and the spec correctly, the digest is closed with (among others) encoding the total length in bits. However, IIUC, the size of
size_t
may be dependent on system architecture. So,size_t
could be 32-bits in size on smaller systems.See
total_len
in structSha_256
and length-encoding insha_256_close
.
You definitely have a point. This implementation has a few more or less explicit limitations, compared with the official SHA-256 specification:
- SHA-256 is able to calculate a hash for any array of bits up to a certain length, but this implementation, for obvious reasons, has no support for bit array lengths that are not multiples of eight (i.e. it only works with whole octets).
- SHA-256 has support for bit array lengths of up to (2^64 - 1), but for this implementation, at the time of writing, the limit is always a number of octets which is up to SIZE_MAX. Please note that
sizeof (size_t)
needs not even be 4 (32 bits), since the C-standard only guarantees that SIZE_MAX is at least 65535 (16 bits). Forcalc_sha_256
, theSIZE_MAX
limit makes perfect sense, since the machine is not supposed to be able to store larger arrays anyway. But for the newer streaming interface, this limit is a little nasty, because it is not really obvious, since client code is free to invokesha_256_write
as many times as it wants before invokingsha_256_close
. However, it should be noted that whilesize_t
is guaranteed to be available,uint64_t
is optional! If available, usinguint64_t
could also be less performant than usingsize_t
for the same calculation, although that is unlikely to have an impact here. All in all, that limitation needs to at least be documented, and removing it by usinguint64_t
for the total byte length instead ofsize_t
probably has more benefits than drawbacks for this implementation. It should also be noted that if we switch touint64_t
the total length limitation will be that of SHA-256, which is 64 bits for the bit array length! Beyond that, the results are unpredictable. That limitation is already clearly documented, however.
Thanks for clarifying. This is my limited knowledge of C showing. :-)
You definitely have a point. This implementation has a few more or less explicit limitations, compared with the official SHA-256 specification:
To clarify, I understand there are limitations by design. That's fine. I am was arguing for a bug in the current implementation.
My concern is that, given compilation on a more limited platform, your array indices go out-of-bounds. In sha_256_close
, your code assumes a 64-bit/8-byte data-type for size_t
(with indices going up to 7
) which is not always true. You'll need to use an 8-byte sized data-type (or another way of simulating a total of 8 bytes) when you write out the hash-message length as per specification.
I misread. I checked again, and noticed you start writing with least-significant bytes, so excessive shifting would only produce more zero-values, i.e. zero-values for most-significant bits. So, that is where my reasoning failed. Thanks for following up.
@cobratbq, thank you for double-checking your arguments. However, mine still hold: you still had a point and I still want to somehow handle the issue, probably by simply changing size_t
to uint64_t
i two locations. That's why I had not closed the issue, and that's why I had self-assigned it. Reopening, now.
Quick check: you changed type to uint64_t
. Does your code already take into account that the size is encoded in bits, but total_len
accounts in bytes?
I think total_len
may now hold values up to 2**64
. When length is encoded, it is shifted left by 3 to acquire the bit-count. Just to check .. does the code warn/fail when total_len >= 2**61
? (To clarify, it is not of immediate concern to me. I just want to check with you if your change does what you expected.)
Also, you mentioned uint64_t
is not a suitable type for portability?
Quick check: you changed type to
uint64_t
. Does your code already take into account that the size is encoded in bits, buttotal_len
accounts in bytes?
Yes, it does. Otherwise, it would not pass the extensive testing.
I think
total_len
may now hold values up to2**64
. When length is encoded, it is shifted left by 3 to acquire the bit-count. Just to check .. does the code warn/fail whentotal_len >= 2**61
? (To clarify, it is not of immediate concern to me. I just want to check with you if your change does what you expected.)
How do you mean that it would "warn"? Knowing how client code wants to handle errors is impossible, so this library just performs the calculation to the best of its ability, which hopefully guarantees that the result will be correct for all possible legal SHA-256 inputs. For illegal inputs, since unsigned integers wrap around instead of overflowing in C, the algorithm will in fact produce a result, although that result probably will be meaningless. But then again, SHA-256 is not defined in such cases, so we could argue that this library's result in such a case is "correct" too.
What really applies in such a case is the note for sha_256_write
:
* @note This function may be invoked an arbitrary number of times between initialization and closing, but the maximum
* data length is limited by the SHA-256 algorithm: the total number of bits (i.e. the total number of bytes times
* eight) must be representable by a 64-bit unsigned integer. While that is not a practical limitation, the results are
* unpredictable if that limit is exceeded.
Also, you mentioned
uint64_t
is not a suitable type for portability?
It's optional in C99 but so is uint32_t
. So here, we are assuming that all practical client code will have access to both. That's what I meant by "more benefits than drawbacks".
To clarify/summarize: total_len
can store sizes up to 2^64
bytes. But when creating the final digest, the length is encoded in bits, so the 3 highest bits are silently dropped.
Essentially, you say that this is by definition of sha-256 and you don't error out if the user goes beyond the specification. It makes sense. I wanted to check if that was considered.