Tracking issue for the #[alloc_error_handler] attribute (for no_std + liballoc)
SimonSapin opened this issue · 139 comments
This attribute is mandatory when using the alloc
crate without the std
crate. It is used like this:
#[alloc_error_handler]
fn foo(_: core::alloc::Layout) -> ! {
// …
}
Implementation PR: #52191
Blocking issues
Original issue:
In a no_std
program or staticlib, linking to the alloc
crate may cause this error:
error: language item required, but not found: `oom`
This is fixed by providing the oom
lang item, which is is normally provided by the std
crate (where it calls a dynamically-settable hook #51245, then aborts). This is called by alloc::alloc::handle_alloc_error
(which is called by Vec
and others on memory allocation failure).
#![feature(lang_items)]
#[lang = "oom"]
extern fn foo(_: core::alloc::Layout) -> ! {
// example implementation based on libc
extern "C" { fn abort() -> !; }
unsafe { abort() }
}
However, defining a lang item is an unstable feature.
Possible solutions include:
- Add and stabilize a dedicated attribute similar to the
#[panic_implementation]
attribute:
#[alloc_error_handler]
fn foo(_: core::alloc::Layout) -> ! {
// …
}
The downside is that this is one more mandatory hoop to jump through for no_std
program that would have been fine with a default hook that aborts.
Movestd
’s dynamically-settable hook intoalloc
: #51607. The downside is some mandatory space overhead.
Better idea: remove the lang item and move all of OOM handling to liballoc, except for the default hook that prints to stderr. Have compiler magic similar to that of #[global_allocator]
that uses the printing default hook when std
is linked, or a no-op default hook otherwise.
Bikeshedding on oom
renaming: alloc_error
would be shorter.
I forgot to link #51543, which does indeed use alloc_error
.
@japaric, @glandium, @Amanieu I’ve updated the the issue description here with arguments from #51607 and rust-lang/rfcs#2480 and to list two alternatives.
Replying to @glandium's rust-lang/rfcs#2480 (comment) here to not derail the alloc RFC.
Every #[no_std] user of the alloc crate would have to implement their own
Not everyone has to implement their own. An implementation can be packed into a crate and it just takes adding a extern crate oom_abort;
to register an OOM handler -- this is no different than what we have for global allocators (e.g. extern crate alloc_jemalloc
).
and they can't use instrinsics::abort, so it becomes harder than necessary.
Then stabilize intrinsics::abort
or provide a stable oom_abort
crate with the rust-std
component. Though it's better to stabilize intrinsics::abort
because then we can use it for #[panic_implementation]
too.
The problem is that with a #[oom] attribute, you don't get a default handler at all.
No default OOM handler is good. There's no sensible default for every single combination of no_std
program and target anyways. For example, intrinsics::abort produces a linker error on MSP430 because there's no abort instruction in the MSP430 instruction set.
Also, not having a default is consistent with both #[panic_implementation]
and #[global_allocator]
in #[no_std]
context. Why special case the OOM handler?
Another reason why I like static registration of global properties is that it's less error prone. Say you want the OOM (or panic) handler to behave differently between development and final release. With #[oom] you can write this:
#[cfg(debug_assertions)]
extern crate oom_report; // for development, verbose
#[cfg(debug_assertions)]
extern crate oom_panic; // for release, minimal size
This is clearly wrong, you get a compiler error. With the set_alloc_error_hook
you don't get a compiler error; you won't notice the problem until you hit an OOM in dev and lost your opportunity to track down the root of the OOM.
The other reason I like #[oom] / #[panic_implementation] is that you can be sure of the behavior of the OOM / panic if you register the handler in the top crate.
extern crate oom_abort; // I'm 100% sure OOM = abort
With the hook API you have no guarantee
fn main() {
set_alloc_error_hook(|| abort()); // OOM = abort, maybe
dependency::function(); // this is totally free to change the OOM handler
// ..
}
Finally, if you need the ability to override the OOM handler at runtime using hooks you can implement that on top of #[oom].
@japaric I find this convincing regarding static v.s. dynamic, thanks.
For example, intrinsics::abort produces a linker error on MSP430 because there's no abort instruction in the MSP430 instruction set.
Oh, I was wondering if something like that ever happened.
How does one typically deal with unrecoverable situations on MSP430? An infinite loop?
No default OOM handler is good.
I think we can have a default, with a Sufficiently Advanced Compiler. (Grep for has_global_allocator
for code that does something similar.)
But you’re saying we should not, and force that question on ever no_std
user. This would be one more barrier before being able to get to Hello World.
FWIW an attribute like
#[oom]
I also think would be a great idea (and I think I'm also convinced that it may be worth it over this dynamic strategy), and it could be implemented pretty similar to#[panic_implementation]
Right. I think the remaining question is: should there be a default? If not, what should be a "typical" implementation that we’d recommend in docs? Should we add a stable wrapper for core::intrinsics::abort
? (In what module?)
The docs for core::intrinsics::abort
claim:
The stabilized version of this intrinsic is
std::process::abort
But besides availability, the two functions do not have equivalent behavior: servo/servo#16899.
How does one typically deal with unrecoverable situations on MSP430? An infinite loop?
@pftbest and @cr1901 would be more qualified to answer that question. I'm not a MSP430 developer myself.
On Cortex-M the abort instruction triggers the HardFault handler. While developing I configure panic and HardFault to log info about the program, or just to trigger a breakpoint if binary size is a problem. In release mode, I usually set panic to abort and make the HardFault handler disable interrupts, shut down the system (e.g. turn off motors), signal the failure (e.g. turn on a red LED) and go into an infinite loop but this is very application specific. Also, in some applications reaching an unrecoverable situation means that something is (very) wrong with your software and that it should not be deployed until it's fixed.
But you’re saying we should not, and force that question on ever no_std user. This would be one more barrier before being able to get to Hello World.
#[global_allocator] is not mandatory for #[no_std] binaries. And the oom lang item is only required when you are using #[global_allocator]. So, no, not all no_std program developers have to deal with the oom handler.
should there be a default?
I think there should not be a default.
If not, what should be a "typical" implementation that we’d recommend in docs?
The #![no_std] application - target space is too broad to recommend anything that will behave the same everywhere. Consider intrinsics::abort:
- on wasm means terminate executino and notify the wasm interpreter or host system
- on msp430 it causes a linker error
- on Cortex-M it defers the error handling to the HardFault exception handler
- on Linux systems it terminates the process with a SIGILL exit code
Should we add a stable wrapper for core::intrinsics::abort? (In what module?)
Yes. As core::arch::$ARCH::abort
maybe? But only on architectures whose instruction sets define an abort / trap instruction.
The docs for core::intrinsics::abort claim:
Those docs should be fixed. iirc process::abort tries to does some clean up (of the Rust runtime?) before aborting the process. intrinsics::abort is an LLVM intrinsic that maps to the abort instruction of the target instruction set so the two are not equivalent.
How does one typically deal with unrecoverable situations on MSP430? An infinite loop?
Infinite loop is how I typically deal w/ it. @pftbest can override me if he knows something better though, as it's been a while since I delved into msp430 internals :).
#52191 has landed with an attribute for a statically-dispatched function, and no default.
mm/krust/libkrust.a(alloc-062fb091d60c735a.alloc.67kypoku-cgu.11.rcgu.o): In function `alloc::alloc::handle_alloc_error':
alloc.67kypoku-cgu.11:(.text._ZN5alloc5alloc18handle_alloc_error17h59bd4dd5f11cdd3fE+0x2): undefined reference to `rust_oom'
I'm getting the above in one my projects. I have defined the handle_alloc_error
function as in the OP. The code compiles just fine, but the linker cannot find the function.
However, it seems to work if I add extern
to the function definition.
@mark-i-m I assume you mean a #[alloc_error_handler]
function, in a #![no_std]
crate. Could you provide a set of steps/files to reproduce the issue?
@SimonSapin Sorry, yes, that's what I meant. Let me try to distill my code down to a minimal example.
Hmm... I'm having a very hard time reproducing this minimally. The build environment I am in is rather convoluted and involves linking again some compiled C code.
I don’t know if unwinding without std
is supported at all. Consider either adding std
to your dependency graph or compiling with panic = "abort"
.
What's the status of stabilization of this feature? We're using alloc
in developing a kernel, and it's currently one of the few features keeping us on nightly.
What is the library team's involvement here? #[alloc_error_handler]
seems like a pure T-Lang matter as it, as far as I understand, does not involve any changes to the standard library's public API. The function set_alloc_error_hook
seems like a different matter but this issue just tracks the attribute...
I disagree. There’s an ad-hoc attribute because we don’t have a more general mechanism for “call a function based on its declaration without depending on a crate that contains its definition, and require exactly one definition somewhere in the crate graph, whose signature is type-checked.” There’s precedent for this kind of attribute with #[global_allocator]
, and this one is no different as far as the language is concerned.
But the role of this handler and the handle_alloc_error
function that it supports are entirely a library matters.
Ad-hoc or not... anything and everything that cannot be done in the stable language and needs compiler support is a T-Lang matter (unless it is about implementation defined behavior or optimizations within the confines of the specification, in which case it's a T-Compiler matter). This includes attributes, intrinsics, and lang items. When a general mechanism for libraries is provided it becomes a T-libs matter. In this case you are actually performing custom type checking involving both attributes and lang items.
Yes, #[global_allocator]
and in particular rust-lang/rfcs#2325 happened. In my view those are clearly examples of where the language team should have been involved but was not. They are "precedents" I don't want to repeat.
So just to circle back, what needs to happen to stabilize #![feature(alloc_error_handler)]?
Maybe let's not stabilize it? And do rust-lang/rfcs#2492 instead. (Oh i just noticed that's nominated, yay!)
Can someone explain how this API fits with GlobalAlloc
?
AFAICT, it is not part of the GlobalAlloc
trait. Allocators can return ptr::null_mut()
if an allocation cannot be satisfied, and the GlobalAlloc::alloc
documentation calls out that this isn't necessary due to OOM (https://doc.rust-lang.org/1.29.1/std/alloc/trait.GlobalAlloc.html#errors).
In C, errno
can be used to query whether malloc
errored, and why. In Rust there doesn't seem to be an alternative.
So I would understand that allocators could provide this API, to allow users to query the type of error, or to abort if the type of error was OOM.
Yet this API appears to be independent from the allocator, and any code can override it. So I'm confused about what the purpose of this API is, and how can an user implementing it for any GlobalAlloc
be even be able to tell, whether the allocator ran out of memory or not.
Maybe this API has nothing to do with OOM, or error handling in general, and it is supposed to just be called by code that wants to terminate the process if any allocation error happens ?
If so I find the name a bit misleading, and I still don't know how can an implementer of this API be able to to anything better than printing "An allocation error happened for this Layout", without any specific knowledge of the allocator being used.
For example, on Linux, one could provide a better and more standard error message by calling explain_malloc(size)
(https://linux.die.net/man/3/explain_malloc), but for handle_alloc_error
to be able to call that, it would need to know that the allocator being used is the system's malloc, and not, e.g., jemalloc.
@gnzlbg I’m having a hard time understanding the first of your last two messages. This attribute has nothing to do with querying information from the allocator.
When an allocation fails (which in GlobalAlloc::alloc
is represented by returning null), APIs like Vec::push
that don’t want to propagate that failure to their caller can instead call std::alloc::handle_alloc_error(layout: Layout) -> !
.
When libstd is linked, handle_alloc_error
prints a message to stderr then aborts the process. In a #![no_std]
environment however, we can’t assume there is such a thing as a process or a standard output. Therefore we require the program to provide (through this attribute) a function that does not return, which handle_alloc_error
will call.
This is similar to the #[panic_handler]
attribute, which provides a function that panic!()
(eventually) calls.
Maybe […] it is supposed to just be called by code that wants to terminate the process if any allocation error happens ?
Yes.
If so I find the name a bit misleading,
Although the attribute is unstable and we can rename it, it relates to the name of std::alloc::handle_alloc_error
which is stable.
and I still don't know how can an implementer of this API be able to to anything better than printing "An allocation error happened for this Layout", without any specific knowledge of the allocator being used.
That’s exactly what libstd does. This attribute is all about replacing that when libstd is not used.
TIL about explain_malloc
. It looks like it’s not in libc but part of a separate library. Does it only work with glibc?
Regarding information about why an allocation failed, maybe we can add APIs for that (maybe through rust-lang/wg-allocators#23) but the #[alloc_error_handler]
attribute is unrelated.
Thanks for the explanation.
I'm still trying to fill in the blanks about how std::alloc::handle_alloc_error
is supposed to print the allocation error message without knowing anything about the allocator. As in, if we were to go from Box<T>
/Vec<T>
to Box<T, A>
/Vec<T, A>
, how is handle_alloc_error
supposed to know which A
was used for the allocation ?
Or is the intent to only use this API for the global allocator and introduce a different API for other allocators ?
It looks like it’s not in libc but part of a separate library. Does it only work with glibc?
The system one works with the system allocator, but each memory allocator has its own APIs for this.
So for example, with jemalloc, you probably want to override malloc_message
, which jemalloc will use to dump the reason for an error to a stream, and on error, you probably want to open this stream, and print its contents to the user. With TCMalloc you might / might not want to print the status of the heap on error.
Other allocators would have their own ways to communicate this information. E.g. APIs like posix_memalign
return an error code that one can then use to print an error string (e.g. alignment is not a power of two; alignment is not a multiple of sizeof(void*)
, alignment request too high, etc.). On error, a global_allocator
wrapping that in Rust can either panic, or return a null pointer. But when Vec gets the error and wants to print it, it would be nice for it to be able to do so. AFAICT the allocator only communicates with Vec
via a raw pointer, so the allocator would need to store the error in some local context, e.g., errno
-style, and handle_alloc_error
would need to query that. But for doing so, it needs to have some kind of API to the allocator that was used for the failed allocation.
The message that libstd prints is "memory allocation of {} bytes failed", layout.size()
regardless of the allocator type. It doesn’t print more information than that. I am not aware of any plan or request to make it print more than that (before yours today, if we take this as a feature request).
I think I misunderstood what the API was for. I thought this was an API for printing why, as in, the reason, an allocation failed.
Instead, it appears to be for printing within libstd an error message saying that an allocation attempt for some layout failed, and that's it. Since the API is public and stable, downstream crates can use it to print the same error message that libstd does.
What I'm not sure I understand is why does the implementation of the API need to be a lang item. Can't the libstd version just panic!("memory allocation of {} bytes failed", layout.size())
, and delay what happens on panic to the panic handler ?
What's the motivation for allowing extra customization here ? E.g. it appears to me that most people will want to either abort, unwind, or loop forever, which is what the panic handler on the target probably also wants to do. Or is there a use case where the implementations need to be different?
I’m not sure if you mean handle_alloc_error()
or #[alloc_error_handler]
when you say “the” API.
libstd’s implementation of handle_alloc_error
(which you can think of as being provided through #[alloc_error_handler]
by libstd, though it isn’t literally) aborts the process even if panic!
would unwind the thread.
#[alloc_error_handler]
can only be used when libstd is not used. It exists because libcore “doesn’t know” how to abort the process. (We don’t want libcore or liballoc to assume there is a process at all.)
In my case, I'm using alloc_error_handler
to trigger the OOM killer in my OS kernel. The ability to customize the behavior of an OOM is important in no_std
settings.
The ability to customize the behavior of an OOM is important in no_std settings.
How are you querying that the error is an OOM ? Or which allocator it originated from ?
I’m not sure if you mean handle_alloc_error()
Yes, that's what I meant.
It exists because libcore “doesn’t know” how to abort the process.
Sounds to me that this problem could be solved with an #[abort_handler]
instead. Although @mark-i-m has other interesting uses for the alloc_error_handler
.
How are you querying that the error is an OOM ? Or which allocator it originated from ?
In my system, the global allocator is the only place where an alloc error can be triggered in kernel mode, so the handler always knows it is an OOM (failures due to fragmentation are functionally the same, so they should also trigger the OOM killer/compaction daemon/swapping). handle_alloc_error
just calls into the memory manager, which has access to the allocator anyway.
I haven't really thought about more complex situations, and I don't know much about per-container allocators, so take this with a grain of salt... I also recognize that my use case is pretty niche...
@mark-i-m, based on what you just said, I got to think about why we want the alloc_error handler to be -> !
? If we're talking about a kernel, theoretically it should be possible for the kernel to either "expand the heap" (by freeing up memory used for buffers and caches and whatnot), or kill away some of its "processes" or "tasks" or whatever we call them and then retry the allocation. Then the application should be able to continue to work correctly.
This reminds me somewhat of the Memfault handler for the ARM processors.
@axos88 A typical usage is NonNull::new(alloc(layout)).unwrap_or_else(|| handle_alloc_error(layout))
. We want something that doesn’t return, to express “I don’t want to deal with this case”.
Any mechanism to “try harder” to allocate memory would be a better fit for being in the allocator itself. That is, in an impl of the GlobalAlloc
trait (or Alloc
trait), possibly by wrapping another allocator.
By the time handle_alloc_error
is called, the program has already decided to abort. It’s to late to declare we’ve found some memory after all.
Any mechanism to “try harder” to allocate memory would be a better fit for being in the allocator itself.
I disagree. Staying at the example of the kernel, this would mean the allocator would be responsible for instructing the kernel to try to trigger the - let's call it - OOM killer to free some space up for the allocation, whereas IMHO the allocator should only be responsible to tell the kernel that hey, I don't have enough space for this operation, and then the kernel would decide whether to free up space, or crash. In the end it ends up in the same program, but not in the same impl.
By the time handle_alloc_error is called, the program has already decided to abort. It’s to late to declare we’ve found some memory after all.
Then this is not the best name for it, because we're not handling it, we're just trying to accept our fate in a graceful manner.
Agreed that #[alloc_error_handler]
is not a great name, something like #[abort_implementation]
(mirroring the existing #[panic_implementation]
) would be closer to the truth although the core::alloc::Layout
parameter is specific to allocations. However this attribute’s name is related to the name of the std::alloc::handle_alloc_error
function, which is stable. (Though adding an alias and deprecating the old name is a possibility.)
because we're not handling it
It's a matter of perspective. The error already happened. It's unrecoverable. We handle it by doing something and aborting. It's not "handle_to_maybe_recover", after all.
In ant event a non-! would be completely different semantics and not fit the current use-case at all. Not saying this is good or bad, but just a massive redesign.
Basically there are a gazillion interesting things one can do with errors and one global hook can never hope to cover them all. I would expect this to be dead code in a kernel or other program which really cares about error cases. Instead they will be dealt with like all other errors with Result
and explicit application code.
I agree this would be a change in the use case. However if we allow a return from this hook, it would expand the use-case rather than shift it.
If the user wants to use it as it is today, that's great, don't return from it. If the hook returns, retry the allocation. If the user did not correct the issue that lead to the allocation failiure, it will fail again, and the hook will be entered again.
As I see it the ability to return from this hook is an extension to the current use-case without any obvious drawbacks.
This is not expanding, this is entirely different functionality that should be proposed separately.
And, remember that the #[alloc_error_handler]
attribute can only be used when libstd is not linked. When it is linked, it takes care of providing the implementation of std::process::abort
.
Yep, in the case of a kernel, it happens that never returning can be circumvented because it just means the OS can run instead. It makes much less sense for a common application.
So, is there anything that alloc_error_handler
can normally do other than panic, given the limitation of never being allowed to return?
Can't things that need to allocate without handling errors just panic on failure, and then we can just skip the entire alloc_error_handler
concept entirely?
libstd doesn’t panic on allocation failure, it aborts the process. In no_std
there may not be a process to abort.
Can't things that need to allocate without handling errors just panic on failure
@Lokathor with -C panic=unwind
, a panic!
allocates memory, which is something that a process might not want to do if a memory allocation just failed. The alloc_error_handler
let's you change the behavior from abort
to panic!
if you want to.
So, is there anything that alloc_error_handler can normally do other than panic, given the limitation of never being allowed to return?
It can panic!
, abort
, ud2
, loop forever, exit(success)
if it wants to, log something and do any of these other things, it can also just call main()
again and abort
once main finishes, fork the process, ...
The only thing the handler cannot do is return a value, but beyond that, there are many things it can do.
I see.
Well is there any ETA on when we could see progress on this moving forward? What are the actual blockers here?
I'd like to be able to build no_std
binaries for Windows (all the OSes really), but it turns out that you can't reasonably do that on Stable because you need to use this thing if you want the alloc
crate. Otherwise a person would have to basically fork the entire alloc
crate just to get around this one snag.
I'd like to be able to build no_std binaries for Windows (all the OSes really), but it turns out that you can't reasonably do that on Stable because you need to use this thing if you want the alloc crate.
Which application do you have in mind that supports the alloc
crate but not libstd
on Windows and other operating systems?
Windows applications that do not rely on the C runtime and do not link to it.
I'm also a bit stuck due to this not being stabilized. I'm surprised there aren't more no_std
people bringing this up. Maybe everyone uses nightly for other reasons anyway.
@NickeZ It's been brought up every now and then, e.g. just now in our Embedded WG meeting. ;)
It seems like something simple that should be able to stabilize practically "today"?
@rust-lang/lang @rust-lang/libs How would you feel about making #[alloc_error_handler]
default to panic? That is, when libstd (which defines a #[alloc_error_handler]
) is not linked and no other crate defines a handler, do as if this was defined:
#[alloc_error_handler]
fn default(layout: core::alloc::Layout) -> ! {
panic!("memory allocation of {} bytes failed", layout.size());
}
(This panic message matches what libstd’s default allocation error hook (#51245) prints before aborting the process.)
This would unblock no_std
+ alloc
application from running on the stable channel, without stabilizing this attribute.
Panicking is different from libstd’s behavior of aborting the process, but in a no_std
context there may not even be a process to abort. It seems to me that panicking is a reasonable default in that case. #[panic_handler]
gets called, which is stable and quite similar to #[alloc_error_handler]
as it exists as unstable today.
It seems strange to be aborting with std and panicking without. Should we just start panicking in std as well? IIRC that's a change people feel can be made?
As I understand it:
- The reason to make OOM jump to an abort instead of doing a panic is because the panic could unwind (depending on build settings, which as far as I know aren't exposed in the program via cfg flag or anything like that).
- That unwind could allocate
- Since you just failed to allocate, trying to allocate right now is maybe a bad plan. Except that:
- Maybe you tried to make a huge allocation (like asking for
isize::max_value()
on accident) and it's totally actually okay to make a small allocation.- In this case your unwind happens and the thread (but maybe not the whole program) dies.
- On the other hand maybe you're really absolutely out of memory and you can't allocate even a single boxed error, in which case you'd get an OOM panic when the unwind tries to allocate.
- Except you're already in panic mode, and a panic during a panic causes an abort (bringing down the entire program).
- Maybe you tried to make a huge allocation (like asking for
So if libstd were to panic instead of immediately abort, and assuming that panic="unwind"
for that build, the process as a whole might be able to survive an OOM in cases where it previously did not.
PS: From a language stability perspective, it seems entirely future-compatible for the story to be: "for now, you can't define the OOM handler. You always get this default handler in no_std
which will just panic. In the future you'll have the option to define your own handler, or you could of course continue to allow the default handler to be used".
I like @SimonSapin's proposal. In my usecases, an OOM can always be handled by invoking something directly in alloc
, rather than returning null
, so they are functionally equivalent. Having alloc + no_std on stable would be fantastic.
@Lokathor IIRC the main issue was not just that, but the possibility of there being unsafe
code which is sound in the presence of abort-on-OOM but unsound given panic-on-OOM. That is, unsafe
code authors heretofore did not have to consider panic safety every time they called a potentially-allocating function, and now they would (but the code is already written).
[I didn't and don't have any opinions on this and don't want to relitigate it, just as a FYI.]
@sfackler Yeah it’s inconsistent. But maybe acceptable because there may not be a process to abort anyway on some embedded targets, and this issue tracks a way to eventually customize the behavior. Panicking is just the default.
FWIW we have an accepted RFC (but unimplemented, and without a precise mechanism specified) to allow opting into panic instead of abort (in libstd) on allocation errors: #43596
Changing the default in libstd seems potentially more risky to me, as @glaebhoerl mentions.
How would you feel about making #[alloc_error_handler] default to panic?
Since panic!
is allowed to allocate memory (and does so for -C panic=unwind
), what happens if that fails within the #[alloc_error_handler]
? Does it get called again from within the panic!
in an infinite recursion? (or does it have a way to detect whether such recursion is happening?).
I covered this above (#51540 (comment)), a panic during a panic goes into an abort.
For the panic during a panic case to trigger, the first panic must "succeed", but that doesn't happen if the panic fails to allocate memory, right?
EDIT: e.g. if the panic fails to allocate, then the stack is not unwound, yet we call the #[alloc_error_handler]
which will try to panic again, which won't detect a double-panic because we are not unwinding the stack yet. I suppose we can add extra logic to panic!
to handle #[alloc_error_handler]
recursion, maybe by shuffling around the double-panic detection code? (I don't recall if we just call libunwind and ask "are we panicking" or keep extra state around for that already).
std::panic
is not a reexport of core::panic
, they’re two different macros. The latter does not allocate.
It could panic while trying to allocate the string for format!
, which happens before calling the panic machinery. Therefore you need explicit detection for recursion in the alloc handler.
EDIT: Ah nevermind, I just checked the code and the formatting happens after checking for a double panic, so everything should be fine.
@SimonSapin Maybe I misunderstood you? I thought you were proposing for #[alloc_error_handler]
to also panic for std
builds as well, where std::panic
is used, and std::panic
tries to allocate. For core::panic
, it might or might not allocate, depending on how the user implements the panic handler, but today they can allocate if they have access to liballoc
.
My initial proposal in #51540 (comment) was to default to panic only when no other alloc_error_handler
is provided. Since libstd provides such a handler, this new default would only be used when libstd is not linked.
By necessity, this default would use core::panic
rather than std::panic
. The former calls #[panic_handler]
without allocating.
Lines 12 to 28 in 374ad1b
Lines 70 to 83 in 374ad1b
Then, if a user-provided #[panic_handler]
tries to allocate and that fails, yes we’d get a double panic. But that’s not the only case where a user-provided #[panic_handler]
can double-panic.
In a later comment #51540 (comment), @sfackler proposed to also change libstd’s default to panic, for consistency.
My initial proposal in #51540 (comment) was to default to panic only when no other alloc_error_handler is provided.
Ah, that makes sense.
In a later comment #51540 (comment), @sfackler proposed to also change libstd’s default to panic, for consistency.
I was expressing concern about this, but as @Amanieu mentioned, this currently allocates after checking for double panics, so this isn't an issue.
IIRC the main issue was not just that, but the possibility of there being unsafe code which is sound in the presence of abort-on-OOM but unsound given panic-on-OOM. That is, unsafe code authors heretofore did not have to consider panic safety every time they called a potentially-allocating function, and now they would (but the code is already written).
Example:
// Invariant: ptr always points to some allocated memory of len > 0
struct MyDynArray { ptr: *mut u8, len: usize }
impl MyDynArray {
pub fn new() { Self { ptr: GlobalAlloc(..1..), len: 1 } }
pub fn resize(&mut self, val: u8, new_len: usize) { unsafe {
// this unsafe code is sound today, but not if allocation can panic
let new_len = if new_len == 0 { 1 } else { new_len };
let ptr = self.ptr;
self.ptr = ptr::null();
let new_ptr = GlobalAlloc::alloc(...new_len...);
// move elements from [ptr, ptr + size] to [new_ptr, new_ptr + size]
GlobalAlloc::dealloc(ptr);
self.ptr = new_ptr;
self.len = new_len;
}}
}
// safe because this will never see a `self.ptr == ptr::null()`
impl Drop for MyDynArray { fn drop(&mut self) { unsafe { GlobalAlloc::dealloc(self.ptr) } }}
This code is sound because either the call to GlobalAlloc::alloc(...new_len...)
succeeds or it aborts, which means that Drop::drop
will never see a MyDynArray
with self.ptr == ptr::null()
.
If GlobalAlloc::alloc
is changed to unwind on OOM, then such code now has a memory safety issue. While the code can be "fixed" to avoid that, such code might already be written, and we can't easily tell.
Basically, that GlobalAlloc::alloc
does not unwind on OOM is a guarantee of its API that we can't break. We could make #[alloc_error_handler]
panic by default, and add a catch_unwind
to GlobalAlloc::alloc
to make it abort
. For Alloc::alloc
, we could just change its semantics since they are unstable, and maybe we can add a different "system allocator" hook with "panics on OOM" semantics that people can migrate to, and deprecate the old one.
Is unwinding possible when libstd is not linked?
Is unwinding possible when libstd is not linked?
Yes, if you use nightly, port libpanic_unwind
and libunwind
to your platform, and invoke them directly without going through libstd's panic framework. Obviously this relies on very unstable implementation details.
@gnzlbg your example (1) could easily be fixed in the presence of OOM leading to unwind (2) doesn't even have correct usage for the api that GlobalAlloc
exposes (3) doesn't call alloc::alloc::handle_alloc_error
, which is how you'd go to the allocation handler that would either abort or unwind.
To be clear: your code example is not sound today.
If a crate is designed to work with no_std, wouldn't there authors have to consider the possibility that abort does not work? Am I misunderstanding
Today, a person can just write a no_std
lib, and you call the lib and it parses bytes or does a formula or whatever. Many people use no_std
in its literal meaning: without the OS backed standard library. They still assume that there's something out there beyond the process. It's unfortunate, but that's the social contract we've developed.
I guess your panic_handler should trap the thread in a loop or something if it's really unable to exit to a wider OS. That's what I do for GBA. (as a reminder: not an empty loop
because those are UB because of LLVM bug, do a volatile read something over and over so that there's a side effect)
(as a reminder: not an empty loop because those are UB because of LLVM bug, do a volatile read something over and over so that there's a side effect)
loop { continue; }
works just fine. 😅
Surely loop { continue; }
would compile to the same LLVM IR as loop {}
(even the same MIR)?
"Works just fine" is not applicable to UB, almost any UB "works just fine" some of the time, but the risk here is lurking miscompilations.
Surely loop { continue; } would compile to the same LLVM IR as loop {} (even the same MIR)?
No idea. Funny enough I can't reproduce the problem anymore but using loop { continue; }
has never failed me on thumbv6m-none-eabi
(and higher) while loop {}
often has.
It seems like a bug if loop {}
and loop { continue; }
produce different code...
This is a discussion for #28728, please keep this thread about alloc_error_handler
.
@gnzlbg your example (2) doesn't even have correct usage for the api that GlobalAlloc exposes (3) doesn't call alloc::alloc::handle_alloc_error, which is how you'd go to the allocation handler that would either abort or unwind.
To be clear: your code example is not sound today.
I disagree. This is the example with the blanks expanded (playground) and I believe it is correct and sound today. Changing handle_alloc_error
to panic instead, would make this program unsound, and would make it memory unsafe.
Please point out explicitly what you believe to be unsound in this use of the GlobalAlloc
API. AFAICT, the code only relies on guarantees that the API provides.
(1) could easily be fixed in the presence of OOM leading to unwind
Do you have a way to automatically fix or migrate all of existing code that might run into the family of issues that the example above demonstrates ?
well sure it works if you fill in all the checks you skipped before XD
also, just as a reminder: we already have an accepted RFC to make oom=panic
a Cargo config option: #43596
@gnzlbg This example is only sound on the assumption that handle_alloc_error
never unwinds. But I don’t think this assumption is valid. https://doc.rust-lang.org/1.39.0/std/alloc/fn.handle_alloc_error.html documents:
The default behavior of this function is to print a message to standard error and abort the process. It can be replaced with
set_alloc_error_hook
andtake_alloc_error_hook
.
(Emphasis mine.)
I feel that we can probably get away with making OOM unwind by claiming that any unsafe code made unsound by this change was always broken.
I personally feel that unwinding on OOM (even on std) is the right approach: 99% of OOMs are caused by excessively large allocations rather than actually running out of memory. Even if unwinding requires allocating memory, this will usually work fine. A double panic and abort will handle the (rare) recursive OOM case.
I feel that we can probably get away with making OOM unwind by claiming that any unsafe code made unsound by this change was always broken.
I agree.
If I'm reading the thread correctly, the current proposal to get alloc
+ no_std
working on stable is to provide a default alloc_error_handler
that simply calls panic_handler
, which will allow people to use alloc
in a no_std
environment on stable for now while the #[alloc_error_handler]
discussion continues.
The panic handler that is implemented in #[panic_handler]
should take care not to make any allocations to avoid infinite panic recursions.
This seems to work for my usecases, which involve embedded systems running without an OS (or processes that are running with an OS that isn't supported by std), where we can do everything from writing an LED to indicate failure, to sending a message to the operating system to terminate the process with an "OOM" error.
What is needed to reach a consensus to stabilize this?
A few thoughts on moving this forward:
- It sounds like we're close to a consensus on a default allocation error handler that calls the panic handler. Can we confirm that consensus?
- Once we have that consensus, can we get someone specific to volunteer to write a PR implementing that change?
- Once we make that change, I don't think we want to immediately move to stabilize; we should give people time to try the new behavior first.
Also, one other thought: any interest in a very slightly more general mechanism that would avoid further language work for future things like this, along these lines? (Naming intentionally WIP.)
#[global_thing("alloc_error")]
fn alloc_error_panic(...) -> ! { ... }
// Elsewhere
#[global_thing_get("alloc_error")]
fn alloc_error(...) -> !;
... alloc_error(...) ...
This would give a compile time error if any code using a given global thing gets compiled in and that global thing doesn't get set exactly once.
Thoughts on this? (This is intentionally much simpler than the RFC for external existentials.)
Can we confirm that consensus?
The formal way to do this is an RFC for the Lang and Libs team. Or do you feel an FCP (in a dedicated issue) would be enough?
I don't think we want to immediately move to stabilize
This is an interesting case for unstable experimentation because it is about a default behavior when nothing other behavior is opted-in, and affects not just one crate but an executable/staticlib/cdylib as a whole.
Maybe we could have that "default" only apply if any of the crates in the dependency graph has a given #![feature(something)]
opt-in?
a very slightly more general mechanism that would avoid further language work
#[alloc_error_handler]
and #[panic_handler]
are very similar to each other as far as the language is concerned. I feel that the additional language work for another one like these and the number of them we’re likely to ever need are low enough that it’s not worth a general mechanism for the purpose of the standard library.
If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things, I feel there’s definitely enough design space that this needs an RFC. (Type checking and name collisions come to mind, but if anyone wants to discuss that please do it in a separate thread.)
Thoughts on this? (This is intentionally much simpler than the RFC for external existentials.)
That'd be insanely useful for embedded development!
If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things, I feel there’s definitely enough design space that this needs an RFC. (Type checking and name collisions come to mind, but if anyone wants to discuss that please do it in a separate thread.)
I don't think that's what they meant, probably only some syntax sugar to write #[global("allocator")]
instead of #[global_allocator]
for already existing attributes / hacks. There was already an RFC for such a feature last year which received considerable design work, but the lang team postponed it this year (RFC2492: Existential types with external definition). The #[alloc_error_handler]
is one of the many global "hooks" that would have been supported by such a feature.
@SimonSapin I feel like an rfcbot poll (along with a clear description of the incremental change to the existing behavior) would be enough to confirm consensus.
If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things,
Yes, that's what I was suggesting.
@gnzlbg I mentioned that RFC in my comment. I was suggesting something intentionally smaller and simpler in the hopes that it would be easier to do without touching needing existential types.
@SimonSapin I don't want to derail this issue. I think I'm just trying to see if that proposal sounds sufficiently useful and straightforward that it might be worth considering rather than adding case 2 of the 0-1-infinity rule. I was hoping that it sounded extremely simple to implement.
Oh yeah I’m not worried about the implementation, it’s the design that I think has some open questions.
Back to the idea of a default allocation error handler that panics: it becomes unnecessary (for the purpose of unblocking some use cases in the Stable channel) if we stabilize the #[alloc_error_handler]
attribute. I personally would be fine with that, since it’s very similar to the #[panic_implementation]
attribute which is already stable. How does @rust-lang/lang feel about this?
I was suggesting something intentionally smaller and simpler in the hopes that it would be easier to do without touching needing existential types.
You might want to open an internal thread with a proposal. GlobalAlloc
is a trait, alloc_error_handler
is a function, so a feature supporting both as independent "extern global thingies (traits and function types)" sounds at least as complicated as the "extern existential types" feature, which natively supports both because alloc_error_handler
implements a Fn
trait. I'm not sure what a "smaller" feature would look like, but the implementation challenges that it must solve are AFAICT the same, and if it supports both GlobalAlloc
and alloc_error_handler
, the amount of expressive power it adds to the language is the same, unless we were to artificially restrict the feature to only support some "blessed" existential types (e.g. GlobalAlloc
and alloc_error_handler
, but not user-defined types), but that's pretty much what we already have with the attribute system today.
#[global_allocator]
works with a trait, but really it’s sugar for four "global functions":
rust/src/libsyntax_ext/global_allocator.rs
Lines 76 to 86 in b9cf541
That's kind of what the "existential" in "extern existential" literally means. When one writes:
// crate importing existential
extern existential type Heap: GlobalAlloc;
let x = unsafe { Heap.alloc(...) };
Heap
has a single concrete type that implements the GlobalAlloc
trait, i.e., it is not generic. One of the many ways to implement this is to desugar that into the "four global functions" that you mention:
// crate importing existential
extern "Rust" {
static Heap: impl GlobalAlloc; // We only deal with references to this type
// These functions are not generic over `Self`, they only use `*mut c_void`
// because we don't know the actual type in this TU, but there is only one type:
unsafe fn Heap_GlobalAlloc_alloc(*const c_void, ...) -> ...;
unsafe fn Heap_GlobalAlloc_dealloc(*const c_void, ...);
... // this is generated from the trait definition, not the type definition
}
let x = unsafe { Heap_GlobalAlloc_alloc(&Heap as *const _ as _, ...); }
where some other crate must actually provide a definition of the type, e.g.,
// crate defining existential
struct JemallocHeap; impl GlobalAlloc for JemallocHeap { ... }
pub extern existential type Heap = JemallocHeap;
which then can be desugared to:
// crate defining existential
pub static Heap: JemallocHeap = JemallocHeap; // type is concrete
pub unsafe extern "Rust" fn Heap_GlobalAlloc_alloc(a: *const c_void, ...) -> ... {
GlobalAlloc::alloc(&*(a as *const JemallocHeap), ...) // uses concrete type!
}
pub unsafe extern "Rust" fn Heap_GlobalAlloc_dealloc(a: *const c_void, ...) { GlobalAlloc::alloc(&*(a as *const JemallocHeap), ...) }
...
Whether one applies this program transformation with a built in keyword, or using an attribute, doesn't matter much. There are cleverer ways to desugar the feature, e.g., see here, but these are pretty much stylistic changes. For the alloc_error_handler
, the compiler can just expand:
extern existential type alloc_error_handler: Fn(...) -> !;
to
extern "Rust" {
fn alloc_error_handler(...) -> !;
}
The tricky part is actually detecting collisions, but this is something that we already need to do for any flavor of this feature (e.g. if two crates implement a #[global_allocator]
we want to error at build time, instead of link time, or worse, have the linker pick one of the two allocators, e.g., depending on link order).
I feel like an rfcbot poll (along with a clear description of the incremental change to the existing behavior) would be enough to confirm consensus.
Alright, #66741 is up.
I’ve also written up #66740 to propose FCP to stabilize the attribute.
To some extent these two proposals are alternatives to each other in that either of them unlocks no_std
+ alloc
on Stable. But doing them both may still be desirable.
So, any progress for us alloc
+ no_std
users who want to compile with stable?
Are the right people aware of this issue? This is still the only blocker for using no_std
+ alloc
as far as I can tell and it seems to be stuck on a decision / organizational question.