setting parameters on big endian systems
oconnor663 opened this issue · 13 comments
It looks like functions like blake2b_param_set_node_offset
convert their argument to little-endian on the way in. But those functions aren't exposed in the blake2.h
API. Is the expectation that callers (who plan on supporting big endian platforms) will convert their parameters to little endian on their own? Or maybe the plan was to expose the setter functions in blake2.h
eventually?
I don't quite recall, but I do believe the plan was at some point to expose those functions to the outside, for users that would want to make their own portable parameter blocks.
But since no such users ever appeared, those functions ended up never getting exposed. Even internally, for blake2xp
, we end up making the parameter block manually anyway.
I wonder if that means CPython's blake2 implementation, for example, has a bug on big endian platforms? They seem to be setting the leaf_length
parameter from a native int. I was about to do the same thing myself, as part of working on a Rust wrapper.
That's an excellent question. I think there's no bug, because the reference implementation explicitly loads the parameter block in little endian order. The optimized x86 implementations do not do this, but everything's assumed to be little-endian in those.
But in all honesty I did not check if there is a bug or not; I don't have big-endian hardware to verify this right now.
I was staring at that earlier, and here was how I read it: The load64
function takes a pointer to little-endian bytes, and loads a native integer from them. However, if you're on PowerPC or whatever, and you execute the line self->param.leaf_length = (unsigned int)leaf_size
as Python does, you've written big-endian bytes to the parameter block. The load64
call doesn't expect this, and the result is a flipped value for S->h[i]
.
I haven't owned PowerPC hardware in...some time...but I might try to fire up QEMU tonight and see if I can test this.
Edit: These look useful for testing
https://stackoverflow.com/questions/3337896/imitate-emulate-a-big-endian-behavior-in-c
https://people.debian.org/~aurel32/qemu/
Yeah, I think you're right.
In hindsight, the parameter block ought to be an opaque structure, with the functions used to set the appropriate bytes of it.
Confirmed!
I don't know of any official test vectors that set leaf_length
or node_offset
, but when I run the same two hashes on my regular Intel machine, the results are swapped:
Python 3.6.3 (default, Oct 24 2017, 14:48:20)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, hashlib
>>> sys.byteorder
'little'
>>> hashlib.blake2b(b"foo", leaf_size=1).hexdigest()
'8878fcd2202a81a2a26adeb5cc6e240528c80e2b4ec80b940d76686febe91b368b4a98ebb39b21befcc02ffaf7516e27d2894fc1c096b57c0e3dde69e01c5889'
>>> hashlib.blake2b(b"foo", leaf_size=(1<<24)).hexdigest()
'bfd0c39502d381eb4e45189f051fd67fd69fd972ad471550a879347d846c1700927d19065a08bf3acdd8fc209cbd21a9bb8619c56fc79207a1ece990b21ddf2b'
It should be simple to patch CPython, and I'll see if I can add a test for this. On the libb2 side of things, probably we should at least stick some clarifying comments inline in blake2.h
? I worry that changing blake2*_init_param
to assume native endianness in struct fields would break existing callers who've correctly worked around this. (If there are any?)
Can you think of any other languages or libraries that expose these fields, that we'd want to eyeball for the same bug?
I've pushed a candidate solution to a separate branch, under which existing correct code should be able to continue functioning properly, but a new API exists to explicit build correct parameter blocks.
To be able to do this, I'm using unnamed structs in unions, which appears to be something only standardized in C11, but recent GCC, Clang, and MSVC all appear to support this.
I don't know the extent of the damage. To my knowledge, most bindings/implementations are limited to the variable length and key parameters, for which endianness does not matter.
Those changes look good to me, though I don't have much experience with this code. Maybe we could stick a comment on the anonymous struct like:
// These struct fields are deprecated, and kept around only for backwards
// compatibility. All new code should use the param_set helper functions
// instead. (In particular, setting the leaf_length or node_offset fields here
// gives incorrect results on big endian platforms, if the caller doesn't
// explicitly swap endianness first. See https://github.com/BLAKE2/libb2/issues/12.)
Also will https://github.com/BLAKE2/BLAKE2 want to make similar changes?
Yes, the plan is to merge the two repositories at some point. Hopefully soon.
Oh, I think on lines 178-179 of blake2.h
you're using BLAKE2S_SALTBYTES
and BLAKE2S_PERSONALBYTES
when you mean BLAKE2B_
.
Another design comment about https://github.com/BLAKE2/libb2/tree/fix-bigendian: I think blake2*_param_init
should set the official defaults, rather than just zeroing the whole parameter block. That is, I think it should set fanout
and depth
to 1, and digest_length
to OUTBYTES
. Almost all callers are going to want to do that, but it's also something they're likely to forget, at least until they debug why their hashes aren't coming out right. That would also remove some duplicated code from blake2*_init
and blake2*_init_key
.
Excellent idea, I'll incorporate this.