Keystream buffers aren't zeroized in `StreamCipherCore`
nstilt1 opened this issue · 6 comments
I noticed that StreamCipherCoreWrapper
zeroizes its buffer on drop, but there are larger temporary buffers that are created in stream_core.rs
that do not get zeroized. If the main buffer that contains a block of the keystream gets zeroized, then it would make sense for larger buffers containing the keystream to be zeroized as well.
There are at least 2 solutions to add the zeroizing buffers for stream_core.rs
.
- Initialize the buffers earlier in the function, outside of loops, and then zeroize them at the end if
zeroize
is enabled.
- this solution is fairly simple, and it would only add a small overhead for zeroizing. The extra
zeroize
call should only be called at most once perapply_keystream()
- Make the
StreamCipherCoreWrapper
's buffer a variable-sized buffer, or sized to the maximum size for the given cipher.
- this solution is not so simple, but it would remove the need for creating temporary buffers and it would only zeroize the single buffer when the
StreamCipherCoreWrapper
is done with. - variable-sized buffers seem to be near impossible to pick at runtime without using the heap/
alloc
. Hybrid array might work if it is able to work with array sizes that aren't known at compile time. Otherwise, easiest solution is either solution 1, or picking the largest buffer for the cipher's backends, which would kind of defeat the purpose ofParBlocksSize
There may be more solutions, but I have a PR for solution 1 if that is acceptable.
We intentionally do not deal with stack spilling. Zeroizing stack-based variables can significantly hurt performance and not as useful as you may think, since compiler is free to create as many copies as it wants (and often does). Stack should be bleached separately after you finished working with secret data.
Is there a safe, portable way to perform stack bleaching? I have looked into it a little, but I have only found some unresolved issues and some of your x86 assembly, and I am planning on using aarch64... and I don't fully understand the stack.
But I think I have found a way to try to prevent stack spilling though, using the second solution. By removing temporary buffers from stream_core.rs
and instead using buffers located on the StreamCipherCoreWrapper
, or perhaps on a StreamBackend
struct, the cipher should be flat and own all of its sensitive data. Then, an end user could wrap their cipher with Box
or secrecy::SecretBox
to ensure that all buffers are located on the heap. With that approach, users would not be forced to use the heap.
Adjusting cipher
for that might be a little tricky
Is there a safe, portable way to perform stack bleaching?
Well, code like this more or less works in practice, but I would use it only as a fallback. There are issues with lack of exact guarantees for black_box
and some compiler backends may not support it in the first place. So it's better to use asm!
-based bleaching. Also note that bleaching happens relative to the current stack pointer, so you should prevent inlining of function which works with secret data.
We probably should create an experimental crate for this.
By removing temporary buffers from stream_core.rs and instead using buffers located on the StreamCipherCoreWrapper, or perhaps on a StreamBackend struct, the cipher should be flat and own all of its sensitive data.
I don't think this will work. The whole "backend" shenenigary is needed because algorithms core implementations may generate multiple blocks of data in parallel and we keep only one block in the core wrapper. In other words, temporary buffers are often several blocks long, so they will not fit into the buffer inside StreamCipherCoreWrapper
.
Here's an experimental crate which implements the idea: https://github.com/dsprenkels/eraser
Ideally it would be nice to have first-class compiler support for this sort of thing instead.
Thanks for the info. I was not aware of stack spilling, and I will definitely explore uses for that eraser
crate. I have modified cipher
a little bit though, in the event that boxed stream ciphers might be appealing.
I did not want to change the way that the pos
is stored in the first byte of the buffer, since the pos
might have exceeded the u8
limit using a larger buffer and the other complications it would bring. Instead, the first block is used the same way the current buffer is used, and then the extra blocks in the buffer are only used when XOR-ing full blocks. It essentially uses the code from ApplyBlocksCtx
with write_keystream_blocks
for refilling the buffer.
The main API change in that commit is that implementors of StreamCipherCore
would need to implement ParBlocksSizeUser
, using the largest possible ParBlocksSize
of its backends.
Closing this issue as wontfix for reasons outlined above.