Heap allocations in constants

Question

Heap allocations in constants

oli-obk opened this issue 7 years ago · 74 comments

Current proposal/summary: #20 (comment)

Motivation

In order to totally outdo any other constant evaluators out there, it is desirable to allow things like using serde to deserialize e.g. json or toml files into constants. In order to not duplicate code between const eval and runtime, this will require types like Vec and String. Otherwise every type with a String field would either need to be generic and support &str and String in that field, or just outright have a mirror struct for const eval. Both ways seem too restrictive and not in the spirit of "const eval that just works".

Design

Allocating and Deallocating

Allow allocating and deallocating heap inside const eval. This means Vec, String, Box
* Similar to how panic is handled, we intercept calls to an allocator's alloc method and never actually call that method. Instead the miri-engine runs const eval specific code for producing an allocation that "counts as heap" during const eval, but if it ends up in the final constant, it becomes an unnamed static. If it is leaked without any leftover references to it, the value simply disappears after const eval is finished. If the value is deallocated, the call to dealloc in intercepted and the miri engine removes the allocation. Pointers to dead allocations will cause a const eval error if they end up in the final constant.

Final values of constants and statics

If a constant's final value were of type String, and the string is not empty, it would be very problematic to use such a constant:

const FOO: String = String::from("foo");
let x = FOO;
drop(x);
// how do we ensure that we don't run `deallocate`
// on the pointer to the unnamed static containing the bye sequence "foo"?

While there are a few options that could be considered, all of them are very hard to reason about and easy to get wrong. I'm listing them for completeness:

just set the capacity to zero during const eval
- will prevent deallocation from doing anything
- seems like it would require crazy hacks in const eval which know about types with heap allocations inside
- Not sure how that would work for Box
use a custom allocator that just doesn't deallocate
- requires making every single datastructure generic over the allocator in use
- doesn't fit the "const eval that just works" mantra
actually turn const eval heap allocations into real heap allocations on instantiation
- not zero cost
- use of a constant will trigger a heap allocation

We cannot ban types that contain heap allocations, because

struct Foo;

impl Drop for Foo {
    fn drop(&mut self) {
        println!("foo");
    }
}

const FOO: Foo = Foo;

is perfectly legal stable Rust today. While we could try to come up with a scheme that forbids types that can contain allocations inside, this is ~~impossible~~ very hard to do.

There's a dynamic way to check whether dropping the value is problematic:

run Drop::drop on a copy of the final value (in const eval), if it tries to deallocate anything during that run, emit an error

Now this seems very dynamic in a way that means changing the code inside a const impl Drop is a breaking change if it causes any deallocations where it did not before. This also means that it's a breaking change to add any allocations to code modifying or creating such values. So if SmallVec (a type not heap allocating for N elements, but allocating for anything beyond that) changes the N, that's a breaking change.

But the rule would give us the best of all worlds:

const A: String = String::new(); // Ok
const B: String = String::from("foo"); // Not OK
const C: &String = &String::from("foo"); // Ok
const D: &str = &String::from("foo"); // Ok

More alternatives? Ideas? Code snippets to talk about?

Current proposal/summary: #20 (comment)

Answer 1 · 2018-12-19T16:03:58.000Z

Instead the miri-engine runs const eval specific code for producing an allocation that "counts as heap" during const eval, but if it ends up in the final constant, it becomes an unnamed static. If it is leaked without any leftover references to it, the value simply disappears after const eval is finished. If the value is deallocated, the call to dealloc in intercepted and the miri engine removes the allocation. Pointers to dead allocations will cause a const eval error if they end up in the final constant.

Sounds perfect!

If a constant's final value were of type String, and the string is not empty, it would be very problematic to use such a constant

Ouch. :( Why are Drop types allowed in constants?!?

run Drop::drop on a copy of the final value (in const eval), if it tries to deallocate anything during that run, emit an error

I don't think we should do this: This means that any difference between compile-time and run-time execution becomes an immediate soundness error.

Also, it's not just Drop that causes trouble: Say I have a copy of String in my own library, the only difference being that the destructor does nothing. Then the following code is accepted by your check, but will be doing something very wrong at run-time:

const B: String = String::from("foo");
let mut b = B;
b.push_str("bar"); // reallocates the "heap"-allocated buffer

Answer 2 · 2018-12-19T16:06:38.000Z

The problem with push_str affects statics as well:

static B: Mutex<String> = Mutex::new(String::from("foo"));
let mut s = B.lock().unwrap();
s.push_str("bar"); // reallocates the "heap"-allocated buffer

Answer 3 · 2018-12-19T16:24:50.000Z

Ugh. Looks like we painted ourselves into a corner. Let's see if we can spiderman our way out.

So... new rule. The final value of a constant/static may either be

an immutable reference to any value, even one containing const-heap pointers
- if an UnsafeCell is encountered, continue with 2.
an owned value with no const-heap pointers anywhere in the value, even behind relocations. The analysis continues with 1. if safe references are encountered

The analyis happens completely on a constant's value+type combination

Answer 4 · 2018-12-20T14:24:07.000Z

Looks like we painted ourselves into a corner.

Note that contrary to what I thought, rejecting types with Drop does not help as my hypothetical example with a drop-free leaking String shows.

Right now, even if we could change the past, I don't know something we could have done that would help here.

Answer 5 · 2019-01-09T10:38:35.000Z

Note that contrary to what I thought, rejecting types with Drop does not help as my hypothetical example with a drop-free leaking String shows.

Yea I realized that from your example, too.

I believe that the two step value+type analysis covers all cases. We'd allow &String but not &Mutex<String>. We'd allow SomeOwnedTypeWithDrop as long as it doesn't contain heap pointers. So String is not allowed because it contains a raw pointer to a heap somewhere. (i32, &String) is also ok, because of the immutable safe reference.

Answer 6 · 2019-01-25T15:50:35.000Z

So just having rule (1) would mean if there is a ptr (value) that is not of type &T, that's an error? I think for an analysis like this, we want to restrict ourselves to the publicly visible type. Otherwise it makes a difference whether some private field is a shared ref or not, which makes me uneasy.

I am not sure I understand what (2) changes now. Does that mean if I encounter a pointer that is not a &T (with T: Frozen), it must NOT be a heap ptr? I am not sure if the "analysis continues with 1" describes an exception to "no heap pointers".

Btw, I just wondered why we don't rule out pointers to allocations of type "heap". Those are the only ones where deallocation is allowed, so if there are no such pointers, we are good. You say

We cannot ban types that contain heap allocations, because

but the example that follows doesn't do anything on the heap, so I don't understand.

We currently do not allow heap allocation, so allowing it but not allowing such pointers in the final value must be fully backwards-compatible -- right?

The thing is that you also want to allow

const C: &String = &String::from("foo"); // Ok
const D: &str = &String::from("foo"); // Ok

and that's where it gets hard.

And now what you are trying to exploit is some guarantee of the form "data behind a frozen shared ref cannot be deallocated", and hence allow some heap pointers based on that? I think this is hereditary, meaning I don't understand why you seem to restrict this to 1 level of indirection. Consider

const E: &Vec<String> = &vec![String::from("foo")]; // OK?

Given that types have free reign over their invariants, I am not convinced this kind of reasoning holds. I do think we could allow (publicly visible) &[mut] T to be heap pointers, because we can assume such references to satisfy the safety invariant and always remain allocated. I am very skeptical of anything going beyond that. That would allow non-empty &str but not &String.

Answer 7 · 2019-01-25T21:22:24.000Z

const E: &Vec<String> = &vec![String::from("foo")]; // OK?

Hm... yea, I did not think about this properly. A raw pointer can just be *const () but be used after casting to *const UnsafeCell<T> internally, thus destroying all static analysis we could ever do.

So... we would also allow &&T where both indirections are heap pointers.. but how do we ensure that a private &T field in a type is not also accepted? I mean we'd probably want to allow (&T, u32) but not SomeType::new() with struct SomeType { t: &'static T } because that field might have been obtained by Box::leakand might point to stuff that hasUnsafeCellin it, andSomeTypemight transmute the&'static Tto&'static UnsafeCell`.

I'm not sure if it is legal to transmute &'static UnsafeCell to &'static T where T only has private fields.

Answer 8 · 2019-02-04T10:24:31.000Z

but how do we ensure that a private &T field in a type is not also accepted?

I think we can have a privacy-sensitive value visitor.

but not SomeType::new() with struct SomeType { t: &'static T } because that field might have been obtained by Box::leak and might point to stuff that hasUnsafeCellin it, andSomeTypemight transmute the &'static T to &'static UnsafeCell.

Yeah that's why I suggested only going for public fields. I think such a type would be invalid anyway (it would still have a shared reference around, and Stacked Borrows will very quickly get angry at you for modifying what is behind that reference). But that seems somewhat shady, and anyway there doesn't seem to be much benefit from allowing private shared references.

OTOH, none of this would allow &String because there we have a private raw pointer to a heap allocation. I feel like I can cook up a (weird, artificial) example where allowing private raw pointers to the heap would be a huge footgun at least.

I think if we want to allow that, we will have to ask for explicit consent from the user: some kind of annotation on the field saying that we will not perform mutation or deallocation on that field on methods taking &self, or so.

Answer 9 · 2019-02-15T17:04:08.000Z

we intercept calls to an allocator's alloc

This should intercept calls to #[allocator], methods like alloc_zeroed (and many others) might callcalloc instead of malloc, other methods call realloc, etc. Currently the #[allocator] attribute is super unstable (is its existance even documented anywhere?), but requires a function returning a pointer, and it states that this pointer does not alias with any other pointer in the whole program (it must point to new memory). It currently marks this pointer with noalias, but there are extensions in the air (e.g. see: gnzlbg/jemallocator#108 (comment)), where we might want to tell LLVM about the size of the allocation and its alignment as a function of the arguments of the allocator function.

If it is leaked without any leftover references to it, the value simply disappears after const eval is finished.

Does this run destructors?

If the value is deallocated, the call to dealloc in intercepted and the miri engine removes the allocation.

Sounds good in const eval, but as you discovered below, this does not work if run-time code tries to dealloc (or possibly also grow) the String.

While there are a few options that could be considered, all of them are very hard to reason about and easy to get wrong.

I don't like any of them, so I'd say, ban that. That is:

const fn foo() -> String {
   const S: String = "foo".to_string(); // OK
   let mut s = "foo".to_string(); // OK
   s.push("!"); // OK
   if true { 
       S // OK
    } else {
        s // OK  
    }
} 
fn bar() -> String {
     const S: String = foo(); // OK
     let s = S.clone(); // OK
     if true {
        S // ERROR 
     } else {
         s // OK
     }
}

I think that either we make the const String to String "conversion" an error in bar, or we make it work such that:

unknown_ffi_dealloc_String(bar());

works. That is, an unknown FFI function must be able to deallocate a const String at run-time. If we don't know how to make it work, we could see if banning this is reasonable at the beginning. To say that certain consts cannot "escape" const evaluation somehow.

Answer 10 · 2019-02-16T17:22:00.000Z

Since we can call foo at runtime, too, allowing S in there will get us into the same problems that bar would get us into.

Does this run destructors?

const FOO: () = mem::leak(String::from("foo"));

would not run any destructors, but also not keep around the memory because we know there are no pointers to it anymore when const eval for FOO is done.

Answer 11 · 2019-02-24T00:43:26.000Z

Since this feature was just merged into C++20, the paper doing this would probably be useful to read as prior art: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0784r5.html

Answer 12 · 2019-02-25T10:24:16.000Z

The key requirements seem to be

We therefore propose that a non-transient constexpr allocation be a valid result for a constexpr variable initializer if:

the result of evaluating the initializer is an object with a nontrivial constexpr destructor, and

evaluating that destructor would be a valid core constant expression and would deallocate all the non-transient allocations produced by the evaluation of expr.

Furthermore, we specify that an attempt to deallocate a non-transiently allocated object by any other means results in undefined behavior. (Note that this is unlikely because the object pointing to the allocated storage is immutable.)

I am a bit puzzled by the hypothetical part about "would we a valid core constant expression and would deallocate". @ubsan do you know what is the purpose of this? (The paper unfortunately just states a bunch of rules with no motivation.)

Also, the part at the end about being "immutable" confuses me. Can't I use a constexpr to initialize a static, and then later mutate that static? Or use a constexpr to initialize a local variable and later mutate that?

Answer 13 · 2019-02-25T12:49:42.000Z

@RalfJung These are specifically for constexpr variables, which are variables which live in read only memory. You can allocate and everything at compile time, but if it's not stored to a constexpr variable (the first bit) whose allocation is deallocated by something the compiler can easily see (that second bit), then you must have allocation at compile-time - otherwise, it'll be a run-time allocation. Importantly, these compile time allocations are frozen at compile time, and put into read-only memory.

Initializing a non-constexpr variable with a constant expression (known as constinit) is also valid, but less interesting, because the allocations are not leaked to romem, and are done at runtime. constexpr variables are those which are known at compile time - mutable variables cannot be known at compile time, since one could mutate them. (it would be very weird to support allocation at compile time for runtime data, since one would expect to be able to reallocate that information as opposed to just mutating the data itself)

Answer 14 · 2019-02-25T12:55:08.000Z

am a bit puzzled by the hypothetical part about "would we a valid core constant expression and would deallocate"

@RalfJung These rules are for the initialization of constexpr variables. So in:

constexpr auto foo = bar();

ifbar() returns allocated memory, then foo must have a constexpr destructor, and this destructor must properly free the memory it owns. AFAICT this means that non-transient (see below) allocations must be deallocated, no leaks allowed (EDIT: non-transient allocations are those that don't leak to callers, so if you don't run the destructor of an allocation, that kinds of makes it transient by definition).

Can't I use a constexpr to initialize a static, and then later mutate that static?

Note that the rules you quote are for non-transient allocations, that is, allocations that are created and free'd during constant evaluation and that do not escape it, e.g.,

constexpr int foo() { 
    std::vector<int> _v{1, 2, 3};
    return 3;
}

where the memory allocated by foo for _v is allocated and deallocated at compile-time and never escapes into the caller of foo.

Transient allocations are those that scape to the caller, e.g, if foo above returns the vector _v. These are promoted to static memory storage.

That is, in

constexpr vector<int> foo = alloc_vec();
static vector<int> bar = foo;

the vector<int> in foo points to its memory in immutable static storage, and bar executes the run-time copy constructor of foo, which allocates memory at run-time, and copies the elements of foo, before main starts executing.

EDIT: In particular, the memory of bar does not live in immutable static storage. The memory of bar fields (e.g. ptr, len, cap) live in mutable static storage, but the data pointed to by its pointer field lives on the heap.

Answer 15 · 2019-02-25T13:58:41.000Z

Ok, so basically C++ avoids the issues we've talked about here by using copy constructors whenever it moves to a non-constexpr space.

This basically is

const A: String = String::new(); // Ok
const B: String = String::from("foo"); // Not OK
const C: &String = &String::from("foo"); // Ok
const D: &str = &String::from("foo"); // Ok

because C can be used as C.clone() when an owned value is desired and B is never ok.

Answer 16 · 2019-02-25T16:01:59.000Z

Ok, so basically C++ avoids the issues we've talked about here by using copy constructors whenever it moves to a non-constexpr space.

I'm not sure, maybe there is some sort of optimization that might be guaranteed to apply here in C++ that would elide this, but I don't know why Rust should do the same. To me Rust is even simpler.

The key thing is separating allocations that are free'd within constant evaluation (which C++ calls transient) and allocations that escape constant evaluation (which C++ calls non-transient), which get put in read-only static memory, where this memory can be read, but it cannot be written to, nor free'd.

So if String::from is a const fn:

const C: String = String::from("foo");
static S: String = C;
let v: String = S.clone();

during constant evaluation String::from would allocate a String in its stack (ptr, len, cap), and then the String would do a compile-time heap memory allocation, writing "foo" to that memory. When the String is returned, this allocation does not escape constant evaluation yet, because C is a const. If nothing uses C I'd expect the static memory segment to not contain the allocation as a guaranteed "optimization".

When we write static S: String = C; the allocation of C escapes constant-evaluation and gets byte-wise copied into read-only static memory. The ptr, cap, and len fields are byte-wise copied to S, where ptr is changed to point to the allocation in the read-only static memory segment .

That is, allocations that "escape" to C do not escape constant-evaluation because C is a const. The escaping happens when using C to create S, that is, when the const-universe interfaces with the non-const-universe.

S is immutable, can't be moved from, and it is never dropped, so AFAICT there is no risk of String::drop attempting to free read-only memory (that would be bad).

Creating v by calling clone does the obvious thing, copying the memory from the static memory segment into a heap allocation at run-time as usual.

A consequence of this is that:

const C: String = String::from("foo");
static S: String = C;
static S2: String = C;
// S === S2

Here S and S2 would be exactly identical, and their ptrs would be equal and refer to the same allocation.

Answer 17 · 2019-02-26T13:23:46.000Z

The problem occurs when you move to a static that contains interior mutability. So e.g.

static S: Mutex<String> = Mutex::new(C);
*S.lock() = String::new();

The old value is dropped and a new one is obtained. Now we could state that we simply forbid heap pointers in values with interior mutability, so the above static is not legal, but

static S: Mutex<Option<String>> = Mutex::new(None);

is legal.

This rule also is problematic, because when you have e.g.

static S: Vec<Mutex<String>> = vec![Mutex::new(C)];

we have a datastructure Vec, which is ptr, cap and len and no rules that we can see from the type about any interior mutability in any of the values. ptr is just a raw pointer, so we can't recurse into it for any kind of analysis.

This is the same problem we'd have with

const FOO: &Vec<Mutex<String>> = &vec![Mutex::new(C)];

Answer 18 · 2019-02-26T14:03:10.000Z

We discussed my previous comment on Discord yesterday, and the summary is that it is 100% flawed because it assumed that this was not possible in stable Rust today:

struct S;
impl Drop for S {
    fn drop(&mut self) {}
}
const X: S = S;
let _ = X;

and also because it did not take into account moving consts into statics with interior mutability.

Answer 19 · 2019-02-26T20:30:45.000Z

@ubsan

These are specifically for constexpr variables, which are variables which live in read only memory.

Oh, so there are special global variables like this that you can only get const pointers to, or so?

What would go wrong if the destructor check would not be done? The compiler can easily see all the pointers in the constexpr initial value just by looking at the value it computed, can't it?

Initializing a non-constexpr variable with a constant expression (known as constinit) is also valid, but less interesting, because the allocations are not leaked to romem, and are done at runtime.

Oh, I thought there'd be some magic here but this is basically what @oli-obk says, it just calls the copy constructor in the static initializer?

Ok, so basically C++ avoids the issues we've talked about here by using copy constructors whenever it moves to a non-constexpr space.

Well, plus it lets you write arbitrary code in a static initializer that actually gets run at compile-time.

Answer 20 · 2019-02-26T21:14:52.000Z

constexpr auto x = ...; // this variable can be used when a constant expression is needed
// it cannot be mutated
// one can only access a const lvalue which refers to x

constinit auto x = ...; // this variable is initialized at compile time
// it cannot be used when a constant expression is needed
// it can be mutated

The "copy constructors" aren't magical at all - it's simply using a T const& to get a T through a clone.

The thing that Rust does is kind of... extremely odd. Basically, it treats const items as non-linear and thus breaks the idea of linearity -- const X : Y = Z; is more similar to a nullary function than to a const variable.

Leaking would, in theory, be valid, but I imagine they don't allow it in order to catch bugs.

Answer 21 · 2019-02-27T09:37:00.000Z

constinit auto x = ...; // this variable is initialized at compile time

If the initializer involves non-transient allocations, @gnzlbg said above that they would become run-time allocations. How does that work, then, to initialize at compile-time a run-time allocation?

The thing that Rust does is kind of... extremely odd. Basically, it treats const items as non-linear and thus breaks the idea of linearity -- const X : Y = Z; is more similar to a nullary function than to a const variable.

Yeah, that's a good way of viewing it. const items can be used arbitrarily often not because they are Copy, but because they can be re-computed any time, like a nullary function. And moreover the result of the computation is always the same so we can just use that result immediately. Except, of course, with "non-transient allocations", the result would not always be the same, and making it always the same is what causes all the trouble here.

Answer 22 · 2019-02-27T10:04:30.000Z

If the initializer involves non-transient allocations, @gnzlbg said above that they would become run-time allocations.

If the initializer involves non-transient allocations, the content of the allocation is put into the read-only static memory segment of the binary at compile-time.

If you then use that to initialize a static, then the copy constructor is invoked AFAICT, which can heap allocate at run-time, and copy the memory from the static memory segment to the heap. All of this happens in "life before main".

Answer 23 · 2019-02-27T10:06:33.000Z

then the copy constructor is invoked AFAICT

I'm not 100% sure about this, and it is kind of implicit in the proposal, but AFAICT there is no other way that this could work in C++ because either the copy constructor or move constructor must be invoked, and you can't "move" out of a constexpr variable, so that leaves only the copy constructor available.

Answer 24 · 2019-02-27T10:19:24.000Z

If you then use that to initialize a static, then the copy constructor is invoked AFAICT, which can heap allocate at run-time, and copy the memory from the static memory segment to the heap. All of this happens in "life before main".

But what if I use that to initialize a constinit? (All of this constexpr business was added to C++ after I stopped working with it, so I am mostly clueless here.)

Answer 25 · 2019-02-27T10:26:14.000Z

But what if I use that to initialize a constinit?

None of these proposals has been merged into the standard (the heap-allocation one has the "good to go", but there is a long way from there to being merged), and they do not consider each other. That is, the constinit proposal assumes that constexpr functions don't do this, and the "heap-allocations in constexpr functions" proposal assumes that constinit does not exist.

So AFAICT, when heap-allocation in constexpr functions get merged, the it will be constinit problem to figure this out, and if it can't, then C++ won't have constinit.

I will ask around though.

Answer 26 · 2019-02-27T14:42:35.000Z

So from what @gnzlbg said on Zulip, it seems non-transient constexpr allocations did not make it for C++20, while transient allocations did.

And indeed, there is very little concern with transient heap allocations for Rust as well, from what I can see. So how about we start with getting that done? Basically, interning/validation can check whether the pointers we are interning point to the CTFE heap, and reject the constant if they do.
Well, that'd be the dynamic part of the check, anyway. If we also want a static check ("const qualification" style), that'd be harder...

Answer 27 · 2019-02-27T17:08:41.000Z

So how about we start with getting that done?

+1. Those seem very uncontroversial and deliver instant value. It makes no sense to block that on solving how to deal with non-transient allocations.

Answer 28 · 2019-02-27T17:27:23.000Z

As Ralf mentioned. Statically checking for transience is necessary for associated constants in trait declarations (assoc constants may not be evaluable immediately because they depend on other associated consts that the impl needs to define)

Answer 29 · 2019-03-01T13:03:41.000Z

So... @gnzlbg had a discussion on discord that I'm going to summarize here. The TLDR is that we believe a good solution is to have (names bikesheddable!) ConstSafe and ConstRefSafe unsafe auto traits.

ConstSafe types may appear in constants directly. This includes all types except

&T: ConstSafe where T: ConstRefSafe
&mut T: !ConstSafe

Other types may (or may not) appear behind references by implementing the ConstRefSafe trait (or not)

*const T: !ConstRefSafe
*mut T: !ConstRefSafe
String: ConstRefSafe
UnsafeCell<T>: !ConstRefSafe
i32: ConstRefSafe + ConstSafe.
- the same for other primitives
[T]: ConstRefSafe where T: ConstRefSafe
the data pointer of a fat pointer follows the same rules as the root value of an allocation
- rationale: the value itself could be on the heap, but you can't do anything bad with it since trait methods at worst can get a &self if you start with a &Trait. Further heap pointers inside the are forbidden, just like in root values of constants.
... and so on (needs full list and rationale before stabilization)

Additionally values that contain no pointers to heap allocations are allowed as the final value of a constant.

Our rationale is that

we want to forbid types like
```
struct Foo(*mut ());
```
whose methods convert the raw pointer to a raw pointer to the actual type (which might contain an unsafe cell) and the modify that value.
we want to allow types like String (at least behind references), since we know the user can't do anything bad with them as they have no interior mutability. String is pretty much equivalent to
```
struct String(*mut u8, usize, usize);
```
Which is indistinguishable from the Foo type via pure type based analysis.

In order to distinguish these two types, we need to get some information from the user. The user can write

unsafe impl ConstRefSafe for String {}

and declare that they have read and understood the ConstRefSafe documentation and solemly swear that String is only up to good things.

Backcompat issue 1

Now one issue with this is that we'd suddenly forbid

struct Foo(*mut ());
const FOO: Foo = Foo(std::ptr::null_mut());

which is perfectly sane and legal on stable Rust. The problems only happen once there are pointers to actual heap allocations or to mutable statics in the pointer field. Thus we allow any type directly in the root of a constant, as long as there are none such pointers in there.

Backcompat issue 2

Another issue is that

struct Foo(*mut ());
const FOO: &'static Foo = &Foo(std::ptr::null_mut());

is also perfectly sane and legal on stable Rust. Basically as long as there are no heap pointers, we'll just allow any value, but if there are heap pointers, we require ConstSafe and ConstRefSafe

Answer 30 · 2019-03-31T12:47:30.000Z

I like the idea of using a trait or two to make the programmer opt in to this explicitly!

I think to follow this approach, we should figure out what exactly it the proof obligation that unsafe impl ConstSafe for T comes with. That should then inform which types it can be implemented for. Hopefully, for unsafe impl ConstRefSafe for T the answer can be "that's basically unsafe impl ConstSafe for &T".

I think the proof obligation will be something along the lines of: the data can be placed in static memory and the entire safe API surface of this type is still fine. Basically that means there is no deallocation. However, how does this interact with whether data is placed in constant or mutable memory?

Answer 31 · 2019-08-11T13:08:00.000Z

Could we change the semantics of using a const ? Right now, a const is copied on use, but could we make it so that if a const is !Copy + Clone then using the const calls Clone::clone instead ? If the const contains a compile-time allocation, we can forbid the uses of these consts if they do not implement Clone.

Answer 32 · 2019-08-11T13:52:02.000Z

@gnzlbg You can have a const of uncloneable type e.g. Option<&mut T> (it would just not have uncopyable leaf data, i.e. it would need to be None).

But also, I don't feel great about calling Clone automatically...

Answer 33 · 2019-08-11T19:10:59.000Z

But also, I don't feel great about calling Clone automatically...

Doesn't feel great to me either. I don't see any reasons why the following code would be unsound and think we should try to accept it:

const V: Vec<i32> = allocates_at_compile_time();
let v = V.clone();

It only uses a explicit clone though, so there is no need to do clone automatically as long as we avoid expanding that to something like:

let v = { 
    let mut tmp = copy V;
    tmp.clone()
    // drops tmp => UB
};

like we currently do (EDIT: e.g. we do this for the case where the Vec is empty, but that only works there).

Answer 34 · 2019-08-12T00:36:29.000Z

That seems like something that would work better as a static V than a const, or else as something like const V: &Vec<i32> = &allocates(); V.clone(), or even const V: &[i32] = leaks(); V.into_vec().

The contortions needed to make const V: Vec<i32> work, not to mention backwards compatible, don't seem worth it with so many potential alternative solutions.

Answer 35 · 2019-08-12T07:12:29.000Z

The contortions needed to make const V: Vec work, not to mention backwards compatible, don't seem worth it with so many potential alternative solutions.

@rpjohnst Are there cases where one could use const V: T but it wouldn't be trivial to use const V: &T instead ?

Answer 36 · 2019-08-12T11:18:18.000Z

I think the tricky examples would have to involve something like a &HashMap<K, V> because you can't just get a &[T] out of it.

IMO, exposing "compile-time heap pointers" immutably is much more acceptable (it's even mentioned in "the" pre-miri post) than trying to somehow auto-generate runtime heap allocations.

Answer 37 · 2019-08-28T03:57:40.000Z

A little out of my league here, so hopefully this is useful feedback. My not so secret ulterior motive is to lobby for the stabilization of Freeze.

ConstRefSafe types are a subset of Freeze types - as it is currently defined. ConstRefSafe requires immutability transitively whereas Freeze is not transitive through pointers. Prior art for ConstRefSafe might be the immutable storage class of D which @jeehoonkang pointed me to.

All Freeze types are safe to memcpy as long as the memcpy's capture the lifetime of the value immutably, and all memcpy's are forgotten before the original value is dropped. The lifetime restriction is not important to const's. This could simplify the implementation a bit for supporting const V: Vec<i32> = allocates_at_compile_time().

Desugaring this:

const V: Vec<i32> = allocates_at_compile_time();
let v = V.clone();

into this:

let v = { 
    let tmp = ManuallyDrop::new(copy V);
    (*tmp).clone()
    // drops tmp => OK
};

is safe. Generally using forget/ManuallyDrop on memcpy's of anything that is currently accepted as const is safe (I think).

Is there a use case for implementing ConstSafe manually?

@eddyb points out that there is no conceivable trait that can capture all values that are currently OK to use in a const (Option<Anything> is allowed if None). Since analyzing the value of consts is effectively required, is this is enough for the definition of ConstRefSafe?

unsafe auto trait Immutable {}
impl<T: ?Sized> !Immutable for UnsafeCell<T> {}

It permits String/Vec/etc, and forbids Mutex/Cell/etc.

Answer 38 · 2019-08-28T06:18:01.000Z

All Freeze types are safe to memcpy as long as the memcpy's capture the lifetime of the value immutably, and all memcpy's are forgotten before the original value is dropped.

What do you mean by this? Address identity is also still observable in Rust.

On another note, I wonder how this interacts with the dynamic checks that make sure that constants cannot be mutated. We should intern heap allocations that were created by constants as immutable. But they get created mutably, meaning that we basically can only silently clamp mutability during interning -- but then code like this could compile:

const MUTABLE_BEHIND_RAW: *mut i32 = Box::into_raw(Box::new(42));

But then later *MUTABLE_BEHIND_RAW = 99; would be UB because we are writing to immutable memory.

Answer 39 · 2019-08-28T14:30:41.000Z

const MUTABLE_BEHIND_RAW: *mut i32 = &1 as *const _ as *mut _;

will also end up as mutable allocation (and compiles on stable). Is there any reason we need to treat heap allocations differently except maybe user expectations?

I think we can safely say you should not be mutating anything you got from a constant except the value itself?

Answer 40 · 2019-08-28T14:50:44.000Z

@mtak-

Desugaring this: [...] into this [...] is safe.

I agree, but I don't know what value does allowing this add. If we require const V: &Vec = alloc_at_compile_time(); the user just needs to write let vec = V.clone();. OTOH, doing that would add the cost of implicit clones to the language, which is something that Rust never does. We'll have to change all teaching material from "clones are explicit" to "clones are explicit unless [insert complex set of rules]".

I think we can safely say you should not be mutating anything you got from a constant except the value itself?

What do you mean with "except the value itself" ?

Answer 41 · 2019-08-28T15:07:59.000Z

will also end up as mutable allocation (and compiles on stable)

That's a regression, I'm pretty sure we used to make that immutable before miri.

Answer 42 · 2019-08-28T15:18:13.000Z

That's a regression, I'm pretty sure we used to make that immutable before miri.

Probably. If we unify the promoted scheme between constants and functions that will be immutable again

Answer 43 · 2019-08-28T21:09:59.000Z

Is there any reason we need to treat heap allocations differently except maybe user expectations?

For that one we can argue that it is mutating through a pointer obtained from a shared reference (&0). If we could somehow reflect that in alloc.mutability...

That's a regression, I'm pretty sure we used to make that immutable before miri.

It is possible that this changed with rust-lang/rust#58351. And the previous code was rather too aggressive with marking things immutable in a static.

If we unify the promoted scheme between constants and functions that will be immutable again

Or we just have to merge rust-lang/rust#63955.

Answer 44 · 2019-10-26T16:59:07.000Z

This is slightly hard for me to follow, but have there been any on this in the last few months?

Answer 45 · 2019-11-06T11:01:59.000Z

This is totally paged out, but #20 (comment) is still the current consensus, right?

@oli-obk and me just had a chat about this. The problem is that the latest proposal is based on checking the final value of the constant as well as its type. So what do we do when we do not have the value, such as when defining an assoc const in terms of another assoc const (or, defining a generic const in terms of a generic/assoc const -- with assoc consts we can effectively emulate generic consts)? We need to know if the final value contains a heap pointer!

@oli-obk proposed to "look at function bodies", and I reformulated that into basically an effect system. So I am going to work with an explicit effect system here, but we might end up inferring this, that is unclear.

Basically, we can mark functions as const(heap), which is weaker than const, and allows the function to perform heap operations. Think of const fn as "no effect" and conts(heap) fn as "can have the heap effect". And moreover, we use the ConstSafe trait to "mask" that effect: a function doing heap operations that returns a ConstSafe type can drop the heap effect. (So, Vec::push can be just const!) Then when checking a generic const, we have to make sure that if const(heap) computations flow into the final value of the const, then its type is ConstSafe.

I am still thinking about a precise description of the "effect" here that accounts for the masking. Maybe something along the lines of "the result of this computation might contain heap pointers at a type that is not ConstSafe"? (ConstSafe seems like a bad name at this point, maybe ConstHeap or so would be better.) This is not a "normal" effect I feel, which is more a property of the computation than the return value...

This scheme should be backwards compatible as right now, nothing has the heap effect.

This also answers @ecstatic-morse's question about how to differentiate Vec::new (a const fn right now, will become stable as such tomorrow) from Box::new once the latter becomes a const fn: the latter will be const(heap) fn, not const fn. And its return type is Box, which is not ConstSafe, so it cannot mask this effect either.

Does that seem to make sense?

Answer 46 · 2019-11-06T11:08:19.000Z

I am still thinking about a precise description of the "effect" here that accounts for the masking. Maybe something along the lines of "the result of this computation might contain heap pointers at a type that is not ConstSafe"? (ConstSafe seems like a bad name at this point, maybe ConstHeap or so would be better.) This is not a "normal" effect I feel, which is more a property of the computation than the return value...

Addendum: I think "refinement types" might be a better term here. Think of us as having, for each Rust type T, also the type ConstSafeValueOf<T>, describing those values of T that actually are const-safe even though not all terms of this type are (like what Vec::new() returns).

const fn foo() -> T is basically sugar for const fn foo() -> ConstSafeValueOf<T>, while const(heap) "opts out" of the ConstSafeValueOf wrapper.

Answer 47 · 2019-11-06T11:16:57.000Z

There may be a simpler way. If I remember correctly, at some point all heap code in the standard library will be generic over the heap, although default that heap parameter to the currently used system heap. Maybe we can figure out a system with this generic parameter and const trait impls.

Answer 48 · 2019-11-06T13:26:40.000Z

There may be a simpler way. If I remember correctly, at some point all heap code in the standard library will be generic over the heap, although default that heap parameter to the currently used system heap. Maybe we can figure out a system with this generic parameter and const trait impls.

Kind of. With the current work of the wg-allocators what you can have is something like this:

// You have some allocator:
const N: usize = 1024;
struct MyAlloc(UnsafeCell<[MaybeUninit<u8>; N]>); // probably Arc + Mutex in practice
// That you can instantiate somewhere:
static HEAP: MyAlloc = MyAlloc::new(); 

// References to your allocator implement `AllocRef` which is
// more or less the current std::alloc::Alloc trait:
impl std::alloc::AllocRef for &'static MyAlloc { 
       fn alloc(self, ...) -> ...;
       fn dealloc(self, ...);
       ...
}

// Vec<T, A: AllocRef = std::alloc::System> 
// (note: System is a ZST, but &'static Alloc is not)
let vec: Vec<i32, &'static MyAlloc> = Vec::new_with_alloc(&HEAP);

Now, suppose we wanted to return a Vec using a custom allocator from a const fn. That vector is going to store an AllocRef, which is going to be accessed to allocate and deallocate memory "somewhere". For System, the AllocRef::alloc method just ends up calling an unknown function that we can intercept in const eval, but for MyAlloc that might just return a pointer to a static or similar, and that feels "hard" to intercept and make work in a reasonable way - I don't think we can "just" intercept AllocRef methods.

Maybe a different take on this might be to provide some functions in core::alloc, e.g., core::alloc::{const_alloc, const_realloc, const_dealloc, is_const_alloc...} that allocators can call in a const context, e.g., using a solution to #7 (const_select(const_fn, runtime_fn) ). Or just restrict ourself to System.

Answer 49 · 2019-11-13T12:52:16.000Z

Just FYI: At RustFest Barcelona @oli-obk mentored me through a rustc patch to memoize the evaluation of some const functions. (rust-lang/rust#66294)

This could in future clash with const heap unless memoized values pointing to the heap are cloned appropriately.

For example, imagine one day a const function like below is written:

const fn foo() -> Box<Mutex<i32>> { Box::new(Mutex::new(0)) }

Under the optimisation in my PR, the compiler can memoize compile-time evaluations of foo() because it takes no arguments. It will naively duplicate the bits of the result into all usage places. This is a problem in this case, because the result is a pointer into const heap.

Fore example, in the code below, with memoization A and B would be pointing to the same mutex:

static A: &'_ Box<Mutex<i32>> = &foo();
static B: &'_ Box<Mutex<i32>> = &foo();

Without the memoization, A and B would refer to different mutexes allocated separately with each evaluation of foo().

Two possible avenues to explore to resolve this problem once const heap lands:

Turn off this memoization compiler optimization (a bit sad, but fine)
Detect when the result of a const function call contains an allocation into the const heap. If this is the case, maybe we can duplicate the allocation in the const heap as well when copying the fn result.

Answer 50 · 2019-11-13T13:20:47.000Z

Good observation!

Detect when the result of a const function call contains an allocation into the const heap. If this is the case, maybe we can duplicate the allocation in the const heap as well when copying the fn result.

That, or as a first step we just disable memoization.

Answer 51 · 2019-11-13T13:26:03.000Z

That, or as a first step we just disable memoization.

True! Because the memoization is done using the compiler's query system, the implementation path for that could be something like:

Query the const fn evaluation to get the memoized result.
If the memoized result is detected to contain a const heap pointer then evaluate the const fn body again to produce a new value (and new const heap allocation)

Answer 52 · 2020-06-01T11:41:08.000Z

Has there been any progress on the RFC front?

Answer 53 · 2020-06-02T16:00:26.000Z

yes. I am working on #43 that helps us organize which features depend on which other features. Once we have that we can discuss the roadmap for const features and when everyone is on the same page, start working on RFCing these features.

Answer 54 · 2020-12-19T22:14:20.000Z

Additionally values that contain no pointers to heap allocations are allowed as the final value of a constant.

So mem::transmute::<*const (), usize>() is undefined behavior in const fn?

Answer 55 · 2020-12-20T10:55:37.000Z

That's a different topic (different as in, already applies right now and is orthogonal to heap allocations), and it only applies to the final value in a constant, not to intermediate state during const eval, so const fn are allowed to do that. Basically const FOO: usize = &42 as *const i32 as usize; is not allowed, because that would mean that using that FOO in a promoted (let x: &usize = &(FOO / 3);) would fail to compile (because taking references to math on constants is automatically promoted), even if it is fine if performed at runtime.

Answer 56 · 2020-12-20T14:24:06.000Z

One more question: Let's say I have a function fn new() -> T and T stores pointers to heap allocations as usize (maybe because it wants to store additional information in the lower bits.) Let's say that this function has perfectly defined behavior and so do all other operations on T.

If I decide to add const(heap) to the function definition, by what method will you detect that the behavior is now undefined and abort the compilation?

Answer 57 · 2020-12-20T16:58:54.000Z

As per the current design, you'd have a

const fn new<A: AllocRef>(a: A) -> YourType

and this is perfectly fine, we allow you to do that, even if you store your value in a usize. What is problematic is if you store that in

const FOO: YourType = new(ConstAlloc);

because then, even if that field is private, we have no way of figuring out whether you are going to expose that field to the public in a const way. If you did, there would be a constant of type usize that could be used in promotion, and thus cause an error there. So validation of FOO will fail. Please use a *const () for this case.

Answer 58 · 2020-12-20T17:07:19.000Z

If you did, there would be a constant of type usize that could be used in promotion, and thus cause an error there.

So once a constant is created, you know for each individual bit of that constant if it was computed using the address of a pointer. And if such a bit exists outside of a pointer, an error occurs?

Please use a *const () for this case.

That would be impossible because the pointer would contain an invalid address and validation would thus also fail.

Answer 59 · 2020-12-20T17:13:58.000Z

That would be impossible because the pointer would contain an invalid address and validation would thus also fail.

A good point. See more below

So once a constant is created, you know for each individual bit of that constant if it was computed using the address of a pointer. And if such a bit exists outside of a pointer, an error occurs?

No, you can't do anything with a pointer beyond offset it (miri the tool can do it, but the const evaluator can't, and we don't even have anything at the RFC stage for doing that), and if you offset it out of bounds and put it in the final value of a constant, you get a validation error, too.

All of this is getting very off topic, if you want to discuss this further, please open a post in the internals forum and ping me there. It is entirely unrelated to heap allocations, as you can already do all of this with pointers on the stable compiler today.

Answer 60 · 2020-12-20T17:18:05.000Z

No, you can't do anything with a pointer beyond offset it

Really? If that is so then I have no concerns. But it was my understanding that transmuting from pointer to int would become possible in the future.

All of this is getting very off topic

Whether, under this scheme, adding const(heap) to a function can cause undefined behavior seems like a perfectly on-topic question.

Answer 61 · 2020-12-20T17:35:30.000Z

Actually, doesn't this scheme allow you to implement transmute yourself? E.g.

fn transmute(t: T) -> U {
	let a: *mut T = allocate();
	ptr::write(a, t);
	let u = ptr::read(a as *mut U);
	free(a);
	u
}

Pointer casting, ptr::write, and ptr::read are things that occur in regular allocating code. I'm not sure how you're going to prevent pointers from escaping as usize unless you track every bit.

Answer 62 · 2020-12-20T19:41:13.000Z

There is no const(heap) effect anymore under the currently planned scheme. Everything is happening in the type system via an impl const AllocRef for ConstAlloc. I should have probably updated some things here, but the relevant discussion is in the const-eval zulip and the current plan is summarized in https://hackmd.io/h2O2vkj3RimrBTfm9hvZWA#AllocRef-allocGlobal-and-allocConstGlobal

Answer 63 · 2020-12-20T19:43:33.000Z

I'm not sure how you're going to prevent pointers from escaping as usize unless you track every bit.

well... we do that. you can't actually modify or read bits of pointers. Pointers are completely abstract and tracked on a different level. There is no way to cheat that system, and we do need that system, because we somehow need to tell LLVM about pointers created during const eval. Again, this is nothing that is special to heap pointers. All that is changed by heap pointers is that you can have pointers to mutable and owned allocations. Right now you can only have pointers to either immutable allocations or borrowed allocations.

Answer 64 · 2020-12-20T20:26:09.000Z

I'm sorry but I still don't get it. Where is the error in the following code:

const ADDR: usize = {
    let b: Box<i32, ConstGlobal> = Box::new_in(42, ConstGlobal);
    let v: Box<Box<i32, ConstGlobal>, ConstGlobal> = Box::new_in(b, ConstGlobal);
    let p: *const Box<i32, ConstGlobal> = &*v;
    let addr: usize = unsafe { *(p as *const usize) };
    addr
};

Again, this is nothing that is special to heap pointers.

You keep saying this but I've never said that there is anything special about heap pointers. The critical change required by this proposal is that pointer casting and pointer dereferencing must become const.

Answer 65 · 2020-12-21T07:55:00.000Z

The critical change required by this proposal is that pointer casting and pointer dereferencing must become const.

They already are with a feature gate, you can explore this with any other pointer that you can create during const eval.

Where is the error in the following code:

The error is that you are violating the validation invariant for constants (which is done dynamically once the value of ADDR is computed), which must follow some specific rules in order to make it safe to use constants at all use sites where they can already be used.

const ADDR: usize = &1 as *const i32 as usize;
let x: &'static usize = &(ADDR / 3);

(full working example: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=79215db454a590be2cbc43220c10d4a4)

The above creates an invalid constant and cause an error. This is required to make promotion sound. There were many insights into promotion after the RFC was finished and implemented. So I recommend not reading the RFC, since it is very outdated, instead the document from this repo (https://github.com/rust-lang/const-eval/blob/master/promotion.md) has the right details

Answer 66 · 2020-12-21T10:56:50.000Z

So the answer to

So once a constant is created, you know for each individual bit of that constant if it was computed using the address of a pointer. And if such a bit exists outside of a pointer, an error occurs?

No

was actually Yes. https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=ecd6a47307eaa36bea0ffa0f974b5883

Answer 67 · 2020-12-21T11:05:23.000Z

Yea, sorry. My mental model of the internals of this probably do not reflect how it appears to work from the outside at all. In essense you can indeed say that we track for each bit whether it came from a pointer, but it is more strict than that.

Answer 68 · 2020-12-21T11:17:11.000Z

Ok then I see no way to have UB using this method.

But some containers that use the address of a pointer as a number will not work. E.g. the Bytes crate uses this to distinguish between pointers to Rc<[u8]> and pointers to Vec<u8>.

Answer 69 · 2021-11-29T15:57:44.000Z

The heap effect proposed by @RalfJung seems like a good solution, but I don't think we should use the const(heap) syntax.

We could add a #[const_heap] attribute with the same semantics of the proposed const(heap) effect.

Answer 70 · 2021-11-29T16:21:45.000Z

The thing is, we'd also need const(heap) trait bounds, const(heap) function pointers, and const(heap) dyn Trait. Once you have an effect you need to be able to annotate it anywhere that you abstract over code.

Answer 71 · 2021-11-29T16:29:43.000Z

We could make types with the const(heap) unnameable (like closures) first. We could enable a lot of use cases even if we cannot name them, I think.

(Also, I think this could be a lang initiative, which could accelerate the development)

Answer 72 · 2022-01-28T23:58:25.000Z

There may be a simpler way. If I remember correctly, at some point all heap code in the standard library will be generic over the heap, although default that heap parameter to the currently used system heap. Maybe we can figure out a system with this generic parameter and const trait impls.

Might the Storage trait proposal be relevant here? You could imagine a Storage that acts like a Cow, storing heap allocations made at compile time in static memory but copying and reallocating when that allocation is mutably accessed at runtime. The Storage could be generic over the choice of runtime allocator, allowing custom allocators to be combined with const allocation.

Answer 73 · 2022-02-24T19:18:10.000Z

Very layman's perspective here, but is there a reason the allocator can't just ignore any static segment that these allocations would exist in? Such that the free(...) impl would simply be a noop for references in that segment. Then drop could run to completion, and the normal "dealloc" code could run on any heap allocations within that type.

Answer 74 · 2022-02-24T19:33:30.000Z

AFAIK there is no existing allocator used in the real world which does that. In addition you may choose any custom allocator which doesn't need to support it.