A pure `alloc` library, with today's `alloc` becoming `global_alloc`

Question

A pure `alloc` library, with today's `alloc` becoming `global_alloc`

Ericson2314 opened this issue 6 years ago · 4 comments

@SimonSapin and @rkruppe asked me to lay down my plan in it's own thread, rather than just in a comment in another + tons of references in other threads.

N.B I'm probably going to end up editing this a faiir amount,

Motivation

Most of alloc is "pure" in that it doesn't depend on any external (e.g.) capabilities, and doesn't use stable language features. That makes this code very pure portable---across platforms and hypothetical Rust implementations. Leaving it with the other items in alloc, however, encumbers it with less 0pure things like GlobalAlloc, global OOM handling, etc. But if we pull it out into a separate library, we get more "free" portability: i.e. the code itself need not (or hardly) be changed, just moved.

Also, while the current global hooks in alloc are fine for implementing std, they are less than satisfactory on their own. @japaric in rust-lang/rust#51607 (comment) points out that that for resource constrained environments the dynamism and indirection imposes meaningful cost. I also consider them unergonomic and unidiomatic. Rust today boasts of its lack of virtual functions, imposed global state, and singletons, and simple Result-based error handling. These all are violated by these global hooks.

Many (most?) libraries need allocation, but have couldn't care less about changing these hooks. And consumers of those libraries also really don't want them to not change any global state. If those libraries use the pure alloc, then there is a static guarantee that they cannot change the global state. Circling back to @japaric's concerns, this also makes those libraries closer to being accidentally portable to those obscure resource constrained platforms, and "accidental portability", where code ends up being usable in more situations than its authors intend, should be our gold standard and goal.

Besides the portability benefits, having more of std's implementation on crates.io is good for bringing in more developers and prototyping interfaces before their permanent stabilization in std. External repos, and a simplified build system (at least when building the crate on its own) will allow more distributed development, and the use of regular Rust (insofar that most of the unstable features used here are library interfaces not language features) means that users can fully grok more of std without knowing the "extended" Rust language.

Finally, as part of the plan towards this goal, I propose adding an associated error type to the Alloc trait. This is technically an independent change, but I suppose if I use it in the plan I should defend it here too. Unlike the accepted try_reserve methods, it is race-free in that there's no additional method call, and the return value fully indicates whether the operation failed or succeeded.
Unlike having tons of separate methods on the same type, it allows code to be polymorphic over oom-divergeness, and even in the monomorphic case better convey intent and allows enforces that all errors be manually handled.

Plan

Compiler

rust-lang/rust#50097 preparatory work in rustc to allow extra type params on Box. Take 2; it's actually merged, but only supports ZST.
Get non-0 sized types working for Box too. Some of rust-lang/rust#47043 (upon which the previous PR was based) might be useful. Thankfully no library effort (just stabilization) is blocked on this.

Library

We make a new library called alloc and move core::alloc into it. core::alloc is unstable, so there is no impediment to doing this.

Then we convert collections to use an alloc parameter, and Alloc to have an associated error type so that fallible/infallible allocation is reflected in the type system. [Adding the allocator parameter is already a tentative goal tracked in rust-lang/rust#42774 .]

rust-lang/rust#50882 convert box to use Alloc trait. Per rust-lang/rust#50882 (comment) I think there are stop-gap solutions that allow us to merge this immanently without preventing better solutions later.
https://github.com/quiltos/rust/tree/allocator-error convert the Alloc trait to use associated error and then add the allocator parameter to many collections besides Box. Box is included so need to rebase on previous PR, but this is good because the rebased Box changes will show the benefits of the associated error type alone.

We incrementally move collections out of global_alloc into alloc. We don't now yet how to provide default parameters away from the way the underlying items are defined, so we newtype them in global_alloc instead as a stop-gap. Until alloc is stable in the sysroot (which could even never happen with std-aware cargo), there is nothing force us to commit to alloc::Foo<T, E> and std::Foo::<T> unifying or not at some E.

alloc will be left with the pure code of the collections and the Alloc trait. global_alloc will contain the less pure stuff like GlobalAlloc, oom hook, etc. std should be able to reexport most of Alloc as-is. alloc should be safe to go crates.io, as the only unstable features it contains are unstable library interfaces that do not interact with the compiler. (We probably should have a different mechanism to put unstable items in stable crates.io libraries, so as to avoid all-or-nothing stabilization.)

As a final note, we can continue the path of rust-lang/rust#51846 and move HashMap into the pure alloc too. global_alloc would reexport alloc::HashMap with a default for the Alloc parameter, and std would reexport global_alloc::HashMap with a default for the hasher parameter. Not that the "deferred default" problem is identical for the hasher and the allocator.

Ramifications

Stabilizing today's alloc is technically no issue as it is mainly a subset of std, so retrofitting today's alloc or std onto the "pure" Alloc are sort of equivalent issues. However I am concerns about the various ways it pulls us away from the spirit of this plan.

If this plan goes through, it is my guess that the vast majority of allocation-needing crates will either need the pure alloc or all of std. global_alloc and its hooks would mainly exist for implementing std, and not normal library or binary usage. Stabilizing alloc is thus obviated for the motivations for stabilization given in https://github.com/SimonSapin/rfcs/blob/liballoc/text/0000-liballoc.md.
The name alloc implies it is the "final story" or "one-stop shop" on allocation, when in fact that is the pure alloc library, and global_alloca just provides various global/singleton hooks (mechanisms that really could be applied to just about any trait). Renaming it global_alloc makes that purpose clear, and "leave room" for the pure alloc crate described here.

CC @SimonSapin @rkruppe @japaric @jethrogb @glandium @Amanieu @Haavy @eddyb @eternaleye

Answer 1 · 2018-06-29T07:41:45.000Z

If this plan goes through, it is my guess that the vast majority of allocation-needing crates will either need the pure alloc or all of std.

I disagree, I expect that most allocation-needing crates will just want to use a Vec or Box with the global allocator (e.g. regex). People usually want something that "just works" out of the box (wink).

With that said, I do believe that there is some merit to your idea of having a crate with allocator-generic collections which do not depend on a global allocator. However I feel that we should keep the alloc crate as it is and move towards stabilizing it ASAP because of what it enables in the ecosystem. We can also add a collections crate later on with allocator-generic collections.

Answer 2 · 2018-06-29T11:04:55.000Z

I’ll try to extract some high-level goals that seem to be discussed together here:

Collections and containers should be allocator-generic
It should be easier to contribute to the standard library
It should be possible to avoid relying implicitly on global state

@Ericson2314, I am not confident that this is an accurate representation of what you have in mind (especially for # 3), so please try to present your goals in a succint form similar to this.

I think it’s important to separate high-level goals from the way we can get there. Often, alternative solutions can turn up that are better than the first solution we think of.

While a single solution can sometimes achieve multiple goals, it’s valuable to talk separately about different goals. A many-comments thread can become hard to follow, and points can start being lost in the middle of other stuff. Conflating topics amplifies this problem. More bluntly: just because this is something that you also want doesn’t mean that it belongs in the same thread.

Meta-points aside, responding to the list above:

I absolutely agree, and I believe that there is already strong consensus around this goal. However it looks to me that much of what you discuss here does not affect this goal directly.
I also agree with this goal, but I believe that the solution discussed here (moving stuff to crates.io) both:

Would be difficult to achieve, because of reliance on unstable compiler details
May not achieve the goal. On the contrary, coordinating across multiple source repositories that each gate changes on passing tests can be a very real barrier: https://internals.rust-lang.org/t/the-current-submodule-setup-is-not-tenable/6593. Maybe some pieces of work can be done entirely within one repo, but not all.

This is where I’m mostly guessing, possibly because I don’t know why this is important.

Now, about the specifics:

If we leave it with the other items in alloc, it will always be encumbered by less pure things like GlobalAlloc, global OOM handling, etc.

This appears to be the core of the motivation, but I don’t think this “always” is accurate. liballoc already depends on libcore, so if for example Vec can be made to not assume a global allocator we could very well move it to libcore.

I think that what you mean by "pure" allocation library already exists at core::alloc, and it’s not clear to me what is the benefit of making it a separate. A move like this should not be a goal in itself, but a mean to achieve some goal.

Many (most?) libraries need allocation, but have couldn't care less about changing these hooks.

But as Amanieu wrote they do care about being able to use a global allocator without being allocator-generic themselves and passing around an allocator instance.

And consumers of those libraries also really don't want them to not change any global state. If those libraries use the pure alloc, then there is a static guarantee that they cannot change the global state.

No, changing the global allocator is done through the #[global_allocator] attribute which is part of the language, not any crate. Looking at what crates a library uses cannot tell you whether it’s using #[global_allocator].

having more of std's implementation on crates.io is good for bringing in more developers and prototyping interfaces before their permanent stabilization in std

Moving stuff to crates.io appears to be separate from the rest of this proposal. But regardless, it’s very difficult in this case. In 1.27.0, liballoc declares using more than fifty different unstable features, so any given version of it likely only works with a very narrow set of rustc versions/commits because of changes in those features.

https://github.com/rust-lang/rust/blob/1.27.0/src/liballoc/lib.rs#L77-L130

adding an associated error type to the Alloc trait.

This is to allow that type to be ! for infallible allocation, right? There is some more discussion of this at https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487

Unlike the accepted try_reserve methods, it is race-free in that there's no additional method call, and the return value fully indicates whether the operation failed or succeeded.

What the Alloc trait(s) look like is separate from the API of collections. Vec::push is stable and does not return a Result, so it seems like Result-returning APIs on Vec has to be through new methods.

core::alloc is unstable

It is stable in 1.28.

alloc will be left with the pure code of the collections and the Alloc trait. global_alloc will contain the less pure stuff like GlobalAlloc, oom hook, etc.

In this vision, since the "pure" flavor of Vec does not have access to OOM handling, does it follow that it doesn’t have infallible methods like push? Does that make it a different type that cannot be used with a library that takes a &mut std::vec::Vec<…> parameter?

We probably should have a different mechanism to put unstable items in stable crates.io libraries, so as to avoid all-or-nothing stabilization.

That is a nice goal but I have no idea how it could be possible to achieve without giving up on the stability promise.

Renaming it global_alloc makes that purpose clear, and "leave room" for the pure alloc crate described here.

“Leaving room” is the reason for your objection to rust-lang/rfcs#2480. But again it’s not clear what the benefits are of a separate crate over the current core::alloc module.

Answer 3 · 2018-07-02T12:43:39.000Z

rust-lang/rfcs#2492 Thank you, you both. I started replying to this, but then thought more about the language issues and decided there is a good enough path forward there that we can avoid needing to split up crates. A few days later, and the result is this RFC: rust-lang/rfcs#2492.

Answer 4 · 2018-07-02T13:44:28.000Z

I moved the existing PR bullet points over to rust-lang/rust#42774 (comment). I'll update that with more ones as they're opened.