rust-lang/rust

Tracking issue for the OOM hook

glandium opened this issue Β· 37 comments

PR #50880 added an API to override the std OOM handler, similarly to the panic hook. This was discussed previously in issue #49668, after PR #50144 moved OOM handling out of the Alloc/GlobalAlloc traits. The API is somewhat similar to what existed before PR #42727 removed it without an explanation. This issue tracks the stabilization of this API.

Defined in the std::alloc module:

pub fn set_oom_hook(hook: fn(Layout) -> !);
pub fn take_oom_hook() -> fn(Layout) -> !;
pub fn set_alloc_error_hook(hook: fn(Layout));
pub fn take_alloc_error_hook() -> fn(Layout);

CC @rust-lang/libs, @SimonSapin

Unresolved questions

  • Name of the functions. The API before #42727 used _handler, I made it _hook in #50880 because that's the terminology used for the panic hook (OTOH, the panic hook returns, contrary to the OOM hook). #51264
  • Should this move to its own module, or stay in std::alloc?
  • Interaction with unwinding. alloc::alloc::oom is marked #[rustc_allocator_nounwind], so theoretically, the hook shouldn't panic (except when panic=abort). Yet if the hook does panic, unwinding seems to happen properly except it doesn't.

We have an accepted RFC #48043 that includes adding a way to opt into of panicking instead of aborting on OOM (so that catch_unwind could be used to recover from OOM at coarse granularity). Perhaps this could be the mechanism for that?

Oh right, I had forgotten about that. And in fact, it works:

#![feature(allocator_api)]

use std::alloc::set_oom_hook;
use std::alloc::Layout;

fn oom(layout: Layout) -> ! {
    panic!("oom {}", layout.size());
}

fn main() {
    set_oom_hook(oom);
    let result = std::panic::catch_unwind(|| {
        let v = Vec::<u8>::with_capacity(100000000000000);
        println!("{:p}", &v[..]);
    });
    println!("{:?}", result);
}

cargo run with last nightly:

thread 'main' panicked at 'oom 100000000000000', src/main.rs:7:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
Err(Any)

(despite the #[rustc_allocator_nounwind] annotation on alloc::alloc::oom)

Added item about interaction with unwinding.

nikic commented

@glandium While this example works, the #[rustc_allocator_nounwind] annotation will (likely) prevent some drops from running during unwinding (in particular those that would have to be run inside the function that performs the oom call).

Picture me puzzled.

#![feature(allocator_api)]

use std::alloc::set_oom_hook;
use std::alloc::Layout;

struct Foo;

impl Drop for Foo {
    fn drop(&mut self) {
        println!("Foo");
    }
}

fn oom(layout: Layout) -> ! {
    let f = Foo;
    panic!("oom {}", layout.size());
}

fn main() {
    set_oom_hook(oom);
    let result = std::panic::catch_unwind(|| {
        let f = Foo;
        let v = Vec::<u8>::with_capacity(100000000000000);
        println!("{:p}", &v[..]);
    });
    println!("{:?}", result);
}
thread 'main' panicked at 'oom 100000000000000', src/main.rs:16:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
Foo
Foo
Err(Any)

Or

#![feature(allocator_api)]

use std::alloc::set_oom_hook;
use std::alloc::Layout;

struct Foo;

impl Drop for Foo {
    fn drop(&mut self) {
        println!("Foo");
    }
}

fn oom(layout: Layout) -> ! {
    let f = Foo;
    panic!("oom {}", layout.size());
}

fn main() {
    set_oom_hook(oom);
    let result = std::panic::catch_unwind(|| {
        let f = Foo;
        std::alloc::oom(Layout::new::<u8>());
        println!("after oom");
    });
    println!("{:?}", result);
}
thread 'main' panicked at 'oom 1', src/main.rs:16:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
Foo
Foo
Err(Any)

Ah, I forgot --release, and with it, in both cases, the Foo instance in the closure is not dropped (the one in oom is, though).

With panic hooks they notably return () instead of ! which allows you to chain them (if necessary). I think if we want to use the terminology "hook" for OOM we probably want the same (not returning !) as that'll allow us to also justify take_oom_hook as otherwise I think you can't actually call that and/or delegate in an order where you don't go first?

OOM hooks could always panic/abort themselves of course but the signature may just want to be that by default we don't require it and then the fallback of OOM is to abort as OOM does today

take_oom_hook is still useful albeit annoying, since hooks can't be closures. So you'd have to store the value in a global, but then you can call it from your own function. I can see how returning () would make things more flexible, though.

Why does take_ unregister the hook, as well as returning it? More generally, what’s an example where this function is useful?

This matches panic hooks, and while not exactly simple (because you can't use a closure as hook), it can be used to do whatever your hook does, while still printing the default message from libstd, whatever it is, without having to care about doing it properly yourself (which the current implementation doesn't do, since it can end up allocating aiui, which rather makes the point: one shouldn't try to replicate the code of the default handler)

Or, if another hook was already set, it allows to fall back to that one.

On accessing the current/default hook, sure. But why unregistering it in the process? fn(…) pointer types are Copy, unlike Box<Fn(…)>.

Would you rather add a function to unregister the current hook and break the similarity with the panic hook?

Why is the oom hook a function rather than a closure?

Because a closure can't be stored in an Atomic afaict, and using an Atomic is necessary because RwLock or Mutex can be lazily initialized and allocate memory on some platforms. You don't want to be allocating memory when trying to retrieve the hook while you're handling an oom.

Seems like we can avoid that problem with a flag tracking if set_hook has ever been called. If it hasn't, we just call the default hook directly and avoid touching the mutex.

Well, another (new) reason is that we eventually want to move this to liballoc, which doesn't have access to any kind of locking primitive.

@glandium

Because a closure can't be stored in an Atomic afaict,

It can (playground):

#![feature(atomic_integers, const_fn)]
use std::sync::atomic::AtomicUsize;
use std::sync::atomic::Ordering;

const unsafe fn transmute(x: *mut u8) -> usize {
    union T {
        a: *mut u8,
        b: usize
    }
    T { a: x }.b
}

const BAR: *mut u8 = ((|| 3) as fn() -> i32) as *mut u8;
const FOO: AtomicUsize = AtomicUsize::new(unsafe { transmute(BAR) });
// static FOO: AtomicUsize = AtomicUsize::new(unsafe { transmute(BAR) }); // ALSO OK

fn main() {
    let l = FOO.load(Ordering::Relaxed);
    let l: fn() -> i32 = unsafe { std::mem::transmute(l) };
    assert_eq!(l(), 3);
}

Only closures that do not capture anything can be cast to function pointers, not arbitrary F: Fn(…) values.

#51543 propose renaming the functions to set_alloc_error_hook and take_alloc_error_hook.

Only closures that do not capture anything can be cast to function pointers, not arbitrary F: Fn(…) values.

Of course. I thought we were only talking about closures without any environment, since I don't see how panic hooks with state make much sense (maybe someone can elaborate on that).

If you need to set a closure with an environment as a panic hook you just have to implement a struct representing the environment, implement the fn traits on that, and put the struct in a static in a properly synchronized way. Depending on how big the environment is, you might get away with a single extra atomic, or you might need a Mutex+Arc combo for it. Using a Mutex inside a panic hook sounds like a pretty bad idea to me though.

@gnzlbg the simple case is grabbing the old hook and wrapping it with your own.

Just wanted to provide an update to @glandium's sample code that works as of March 2020. Some of the PRs changed the feature name this was gated behind and renamed the function to set the error hook.

#![feature(alloc_error_hook)]

use std::alloc::set_alloc_error_hook;
use std::alloc::Layout;

fn oom(layout: Layout) {
    panic!("oom {}", layout.size());
}

fn main() {
    set_alloc_error_hook(oom);
}

About the open point 2 "Interaction with unwinding". It would be really nice if this could allow the hook to panic. Reason being that you can then catch such a panic at the C boundary. This would help cases like this: hyperium/hyper#2265

Besides the interaction with unwinding, which I would like to be able to rely on, I'm also interested in the timeline of stabilizing this. It looks fairly quiet, if we answer the unwinding question, is there anything else preventing stabilizing soon?

I think set should always return the previous hook it replaced (Option<fn>). Having set and take as separate functions makes "patching" an existing hook a technically unreliable operation, because someone else could register a hook between a call to take and set.

The oom=panic #43596 issue lists this handler as its dependency, but as far as I understand currently the handler forbids unwinding. So it's not fit for purpose of oom=panic.

Is this implementation going to change to allow unwinding, or is oom=panic going to have to add its own handler using some other way?

The implementation itself doesn't prevent unwinding. What does is pre-existing code that marks oom functions as rustc_allocator_nounwind. The question would be whether they can be removed.

Now that #88098 has been merged, what's the status of this issue?

There's one unresolved question:

Should this move to its own module, or stay in std::alloc?

Hi, I would love to see this feature stabilized. My use-case is handling out-of-memory errors in WebAssembly as panics to avoid aborting the process, as that results in an unhelpful "unreachable executed" message. As far as I know that cannot be achieved using stable Rust, as even the oom=panic flag needs nightly.

However I do have one concern. This API was modeled after std::panic::set_hook, which is a reasonable choice. However that API has some problems when multiple threads want to set a hook at the same time. The main use case seems to be some libraries trying to silence panic messages in their initialization code. I implemented std::panic::update_hook to try to mitigate that issue, but it does not seem ready yet: #92649

So I am wondering if there is some easy way to avoid that problem here before the API is stabilized. I don't know if the use case of a library trying to temporarily set a different oom-handler is common enough to worry about it, or if it will even exist. The simplest solution I can think of is to make set_alloc_error_hook return the old hook. This will allow users to access an atomic swap primitive, which can then be extended into more complex use-cases in external libraries. Let me know what you think.

In #51540 (comment) I argue for the removal of special handling for OOMs in favor of just treating them as a panic. Comment copied below.


After working on the OOM handler for a while, I think that the best way to move forward is to just treat OOM as a normal panic (so that it calls the normal panic handler/hooks). This is what already happens on #![no_std] since #102318 was merged.

I believe that we should do the same for the std case. Specifically:

  • The unstable #[alloc_error_handler] is removed. alloc::alloc::handle_alloc_error now always invokes the panic handler.
  • For backwards compatibility reasons, this is a non-unwinding panic. Unsafe code may not be written to correctly handling unwinding out of a memory allocation (this is in fact a frequent source of bugs in C++!). However this behavior can be overridden with -Zoom=panic which changes the behavior to a normal unwinding panic.
  • Since there is no separate handling for OOM any more, the unstable OOM hook API in the standard library can also be removed.

FYI #51540 is under FCP for closing, which will also close this tracking issue. Please comment in that thread if you have any objections.

This tracking issue was closed when PR #109507 merged to remove the feature, but that was reverted in PR #110782. Reopening to reflect that.

PR #112331 is open to remove the feature again, and is already marked as closing this.