hawkw/mycelium

maitake: replace `task::Cell` with a union

hawkw opened this issue · 3 comments

hawkw commented

currently, maitake tasks use an enum to represent the state of the user-provided task data (either the future, if the task has yet to complete, or the task's output, when it has completed):

enum Cell<F: Future> {
Future(F),
Finished(F::Output),
}

the task's state is currently tracked by both the enum descriminant, and the COMPLETED bit in the task's atomic state word:

/// If set, this task's [`Future`] has completed (i.e., it has returned
/// [`Poll::Ready`]).
///
/// [`Future`]: core::future::Future
/// [`Poll::Ready`]: core::task::Poll::Ready
pub(crate) const COMPLETED: bool;

this means that we have what's essentially an unnecessary word1 of data for the enum descriminant, because whether the task cell contains the future or its output is also tracked by the task's StateCell. @jamesmunns mentioned wanting to replace this enum with a union --- at the cost of some additional unsafe code2, we could shave a word off of the Task's memory usage. this might be particularly useful on memory-constrained devices.

Footnotes

  1. i'm assuming it's a whole word for alignment reasons?

  2. or...maybe not; i think everywhere the task cell is touched is already inside of an unsafe block, because it lives in an UnsafeCell...

hawkw commented

this may make the task Drop impl a little bit annoyinger because the future and/or the output type may have their own destructors that need to be run...

If you're feeling really spicy, we can likely use my absurd allocator ideas to do really invasive optimizations.

Like if the future has completed, it never needs to go back in the run queue, right?

hawkw commented

If you're feeling really spicy, we can likely use my absurd allocator ideas to do really invasive optimizations.

Like if the future has completed, it never needs to go back in the run queue, right?

yeah...although, i like having the task's state word immediately after the MPSC pointers that have to be the first field, so that it's prefetched in cache when a ptr to a task is dereferenced (on x86). and, i would imagine that futures are almost always larger than their outputs1, so i'm not sure if we would actually be saving space by putting the run queue links in the union?

Footnotes

  1. although i can't back this up with actual data, i would imagine that any async block-generated future will at least always be large enough to contain the output type...