bryant/argon2rs

Making argon2rs #![no_std] - lanes configuration?

Evrey opened this issue · 7 comments

Evrey commented

To make argon2rs #![no_std], we'd have to be able to replace all use std::*; with use core::*;. In theory, this would be as simple as doing something like this:

#[cfg(    feature = "core" )] use core::*;
#[cfg(not(feature = "core"))] use std::*;

However, as core doesn't know about threading, we'd have to do something about the lanes parameter for Argon2::new and argon2::defaults::LANES. I can think of three sensible ways to solve the parallelism issue.

  1. The simplest and possibly most sensible solution would be to return ParamErr::TooManyLanes for lanes != 1. This might, however, be absolutely not what a user wants.
  2. Add an extra parameter for feature = "core", which is a V-table struct for a target OS's threading API. Instead of expanding Argon2::new, however, one might just add a second Argon2::with_thread_api function. This can be combined with (1.) by returning an error if api.is_none() and lanes > 1.
  3. Just ignore threading and calculate the lanes sequentially. This would, however, increase the calculation time for secure hashes by a freakin' lot. E.g., with lanes = 8 and the recommended hashing duration of half a second, doing (3.) would take four seconds for the same hash strength. This might result in feature = "core" people choosing very weak parameters.

The main use case I see for #![no_std] Argon2 would be in an operating system, unlocking the system, either in a bootloader or in a kernel. Probably in this environment one would prefer to use all available cores and the maximum SIMD capability, and use less memory.

#[cfg( feature = "core" )] use core::;
#[cfg(not(feature = "core"))] use std::
;

I suggest using feature = "use_std" or feature = "std" and making the std/use_std feature a default feature, to be consistent with other libraries.

I don't know much about Argon2 but I think support for multiple lanes can also be added by using SIMD, right?

Maybe it makes sense to use the futures API to abstract out any use of threading? I'm not sure.

without std::vec, what would block::Matrix look like?

Evrey commented

without std::vec, what would block::Matrix look like?

Does it need to be resizable? Can't remember seing any push anywhere. You might as well just allocate a flat array of blocks, which in the end might even be way faster than the current Vec<Vec<>> solution. (After all, indexing is bounds-checked, especially using Vec.) And for allocation, all an OS developer might need is an allocator crate. Implementing one wouldn't be that hard, especially in an environment like UEFI, which already has a pool allocator for you to use.

I suggest using feature = "use_std" or feature = "std" and making the std/use_std feature a default feature, to be consistent with other libraries.

This is a good idea with a good raison d'être.

Maybe it makes sense to use the futures API to abstract out any use of threading? I'm not sure.

If I understood that correctly, making futures use threads is done externally using std or futures-cpupool, of which both aren't core. Thus, for #![no_std] users, we either have no threading, or an ugly V-table struct parameter, or we would opt in some crate like futures-cpupool, which can be made #![no_std] using a custom threading API.

We could as well do a whole different and crazy thing:

In case of #![no_std], we could link against functions which the crate user has to provide, which are basically just spawn_thread, join_thread, and exit_thread. And if the user doesn't have any custom threading API (yet), he might just opt out those required functions, making argon2rs single-threaded. However, single threading has the above mentioned issues. (See (3.))

Does it need to be resizable?

how else to deal with adjustable memory cost at runtime?

Evrey commented

Well, doesn't the amount of memory to allocate solely depend on the configuration parameters for Argon2? Just allocate a big enough buffer in advance. After all, your API does not allow to change a hasher's configuration, once created. I.e. Argon2::new just allocates raw memory using Rust's allocator API, once the parameters have been checked. Then, just volatile_set_memory and deallocate on drop.

And if I'd need two different memory cost configurations at runtime, I'd just create two different Argon2 hashers.

precisely how would this allocation take place?
On Oct 10, 2016 3:21 AM, "Evrey" notifications@github.com wrote:

Well, doesn't the amount of memory to allocate solely depend on the
configuration parameters for Argon2? Just allocate a big enough buffer in
advance. After all, your API does not allow to change a hasher's
configuration, once created. I.e. Argon2::new just allocates raw memory
using Rust's allocator API, once the parameters have been checked. Then,
just volatile_set_memory and deallocate on drop.

And if I'd need two different memory cost configurations at runtime, I'd
just create two different Argon2 hashers.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#19 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAo8b_BOtn5AQMik_mk3YuxR6JKFaUxYks5qyed9gaJpZM4KP7Qu
.

Evrey commented

Okay, so, the first step would be to link against Rust's allocator API, i.e. against __rust_allocate and others, just like here. With this, you'd __rust_allocate your block of memory on Matrix::new and __rust_deallocate it on Matrix::drop. Your alignment is mem::align_of::<Block>() and your size in bytes is mem::size_of::<Block>() * N, where N is rows * columns of your Matrix, i.e. lanes * lanelen.

After allocating memory, you'd have to initialise it before use. Your array is lanes * lanelen entries big, so you'd do something like this:

let block_array_ptr: *mut Block = ...;
for i in 0..(lanes * lanelen) {
    unsafe { ptr::write(block_array_ptr.offset(i), zero()) }; // Your `zero` function.
}

You might as well just intrinsics::write_bytes here, but this would drop you right into nightly land.

Then, you can use this raw block of memory like any other 2D array, by converting 2D coordinates to 1D ones like this: i = x + row_len * y, and access all entries with *(block_array_ptr.offset(i)). Lots of unsafe code involved, however. There is no way around it if you are okay with a #![no_std] Argon2.

Depending on how paranoid we are, you'd want to ptr::write_volatile or intrinsics::volatile_set_memory lots of zero() or 0x00_u8 before deallocating memory.

Finally, if you'd want a more fancy API instead of raw allocate/deallocate, you could extern crate alloc;. alloc is unstable, however, which means nightly. With alloc, you wouldn't need to link __rust_allocate and others manually. Instead, you might either use alloc::heap::allocate for whatever reason, or use stuff like Box directly.

The good thing is that __rust_allocate, alloc, std::mem/core::mem, and std::intrinsics/core::intrinsics look and work the same for both, std and core crates.


Edit: There is also core_collections, as I just noticed. With this, you might even use Vec<Block> in nightly.