rust-lang/rust

Fix semantics of `let _ = ...` not to drop

MarkJr94 opened this issue · 33 comments

fn main() {
    let x = ~5u;
    let _ = x;
    println!("{}", *x);
}

prints "5". I am under the impression that x should have been moved.

fn main() {
    let x = ~5u;
    let y = x;
    println!("{}", *x);
}

complains that x has been moved as expected

cc @nikomatsakis, this is surprising!

that seems broken indeed. cc me.

Actually, thinking harder on this, I'm not sure it's broken. It depends on what we want the semantics of _ to be. Right now the semantics of _ patterns are inconsistent. We can choose between _ meaning: "drop this value" and _ meaning: "don't bind this value". I think we want the latter; in this particular case, that means that let _ = x; is just a no-op. (Note that today trans sometimes chooses "drop this value" and sometimes chooses "don't bind this value" -- in particular we treat let _ = ... totally different from let (_, _) = ...)

Here is an example to explain why I think we want _ to mean "don't bind this value":

fn some_func(pair: (~Foo, ~Bar)) {
    {
        let (ref x, _) = pair; // should this *really* free the second half of `pair`??
        ...
    }
    {
        let (_, ref y) = pair; // ...because then you couldn't do this latter...
    }
}

Nominating and adding to meeting agenda.

Updated title.

Decided that let _ always run at end of scope. Need implementation.

P-backcompat-lang. (Pretty sure we discussed in a recent mtg.)

We settled on the semantics of _ meaning "ignore". I think
implementing this mostly means removing the special case code in
trans to perform a drop.

@nikomatsakis are you actively working on this? I want to take it.

@nikomatsakis: So this means we want let _ = foo.lock(); to call the destructor at the end of scope, right? That's what I'd like, but I'm not entirely clear on what the decision was from what's written here.

@thestinger actually no, in that case foo.lock() represents a temporary, and hence the decision about when it would be freed depends on #3511. The current rules there that I am working are "end of statement". The main change is that if you do let _ = x (where x is an lvalue) then x is not freed immediately.

Would there be a drawback to making _ always behave the same as if you gave it a name?

@glehel yes. I described them here: #10488 (comment)

@nikomatsakis: Ah, I was hoping that let _ = foo(); was just going to act like let _x = foo() without the need to name it.

@nikomatsakis: I am trying to determine from the dialogue here whether the @thestinger -suggested semantics of having let _ = foo(); act like let _x = foo(); represents a third potential semantics, distinct from the two choices you outlined in #10488 (comment)

After considering the example you provided earlier, and rewriting it replacing occurrences of _ with named arguments, it seems like the _-is-_x semantics is indeed an instance of _ means "drop this value".

fn some_func(pair: (~Foo, ~Bar)) {
    {
        let (ref x, _arg) = pair; // It sounds like this moves ~Bar into _arg ...
        ...
                                  // ... which means we drop ~Bar here ...
    }
    {
        let (_, ref y) = pair;    // ... and thus makes this illegal
    }
}

(Update: after I wrote this I remembered that you cannot have a by-move and by-ref binding in the same pattern, so the above code couldn't work anyway. It seems like our current semantics must be inferring whether to treat _ as a by-ref or as a by-move depending on the rest of the pattern structure. Or maybe it is currently always by-ref, except for the special case of let _ = ...)

So if the above is correct, then I take it that if one wants to write with explicit names everywhere, eschewing _ (hypothetical company policy), then you would need to write the above like so:

fn some_func(pair: (~Foo, ~Bar)) {
    {
        let (ref x, ref _arg) = pair; // do not move ~Bar ...
        ...
    }
    {
        let (ref _ign, ref y) = pair; // ... and thus this is fine.
    }
}

@nikomatsakis is this consistent with your mental model?

(Note that I'm not asserting that _ should act like ref _arg; see aside below.)


An aside: The code transformation shown here did make me wonder if one could explain the new _ via a macro-desugaring (assuming some sort of gensym functionality), rather than saying "it is special no-op form". I was thinking of a pseudo-explanation like _ desugars into ref _fresh_var. But that is probably not correct, since I suspect it is not compatible with the R-value semantics assigned by #3511. So "desugars to ref _fresh_var" seems like third point in the design space. But this third alternative is probably not a very useful one because I anticipate nasty borrowck failures if one were to start having _ map to implicit references. (But then again, it might be closer to what @thestinger was expecting.)

I think what I expected it to do is probably not a great idea, but is useful in a world of RAII locks and such.

@pnkfelix creating a ref binding is not equivalent to _, because it creates a new loan which implies restrictions on the value being matched (for example, it could not be moved).

@thestinger it would indeed be convenient for that case, though inconvenient for others. When let _ = foo() would drop the temporary actually depends on how we define the lifetimes of temporaries. I could imagine saying that "temporaries that appear inside of initializer expressions are dropped at the end of the block", but it seems potentially rather surprising and is not what I was planning on doing.

I think I now understand the situation here better, and want to summarize it in case it's helpful (and so people can correct me if I'm wrong).

There are actually *3* options for the meaning of _:

  • Drop it
  • Don't bind it
  • Bind it

1. Drop it

This is what _ used to mean. In this case, matching something with an _ pattern makes it go out of scope and get dropped. This means

let _ = lock();

and

let x = lock();
let _ = x;

are both more-or-less equivalent to:

fn consume<T>(x: T) { }
consume(lock());

2. Don't bind it

This is the new meaning. It means: leave the RHS alone, do nothing at all with it. In this case,

let _ = lock();

is equivalent to

lock();

so indeed, its lifetime is governed by the rules for temporaries, while

let x = lock();
let _ = x;

is equivalent to just

let x = lock();

in other words, it leaves the RHS alone and does not move out of x. This means let _ = $foo as a top-level pattern is basically pointless, but in nested patterns it lets you do things you otherwise possibly couldn't do, such as let (ref x, _) = y; where you bind a reference to the inside of y while leaving y itself intact.

3. Bind it

This is what I believe @thestinger was suggesting, and what I was trying to inquire about as well. Here matching something with _ is the same as matching it with a name, except that it doesn't actually get a name. This means

let x = lock();
let _ = x;

and

let _ = lock();

are both effectively equivalent to

let y = lock();
// never again refer to `y`

This is pretty nice for RAII, as noted. In this case lock() is not a temporary, and the rules for temporaries are free to behave however they like, they won't apply to it.

Conclusion

Personally, I think 3. is the most intuitive behaviour, but 2. might have more practical benefit. In particular, while you can easily use let _x = lock(); instead of let _ = lock(); for RAII if 2. is in force, I don't know if there's a way to emulate the behaviour of 2. with nested patterns if 3. is in force. Now that I have a clearer understanding of the tradeoffs, I'm not sure which is better.

(jsyk: global online-identity renaming, used to be glehel)

On Wed, Dec 18, 2013 at 12:55:19PM -0800, Gábor Lehel wrote:

Personally, I think 3. is the most intuitive behaviour, but 2. might have more practical benefit. In particular, while you can easily use let _x = lock(); instead of let _ = lock(); for RAII if 2. is in force, I don't know if there's a way to emulate the behaviour of 2. with nested patterns if 3. is in force. Now that I have a clearer understanding of the tradeoffs, I'm not sure which is better.

This sounds like an accurate summary. I don't think any option is the
'most' intuitive, rather it will depend on the context. In any case,
there is no equivalent of #2 ("don't bind it") otherwise, and it is an
important capability.

@glaebhoerl my comment above originally started out as an analysis like yours; the spot where I got stuck was when I tried to translate niko's example, but using your third semantics ("Bind it").

Namely, in your semantics 3., what does { let (ref x, _) = pair; } turn into? What is being bound by _ here? Does the second component of the pair get moved into the binding and then dropped at the end of the scope? This problem is what made me drop my attempt to present something analogous to your semantics 3.

  • (It could be that { let (ref x, _) = pair; } would be illegal and one would be forced to write { let (ref x, ref _) = pair; }, but, Yuck.)
  • (It could also be that under semantics 3, the _ in that example would be desugared to ref _fresh_var, but then we're back into something like the current situation where whether _ binds by-move or by-ref is context dependent.)

I don't know what other options are available under semantics 3.

@pnkfelix Ah, I understand now. I guess that yes,

{ let (ref x, _) = pair; } would be illegal and one would be forced to write { let (ref x, ref _) = pair; }

would be the logical consequence of 3.

That does make 2. look even moreso like the most practical option, even if it doesn't have the meaning that I would naively expect (but maybe that's because I'm coming from Haskell, where the only difference between x and _ is whether the compiler might give you an unused variable warning).

I guess the remaining question is whether people might write let _ = lock(); expecting it to work, and whether it's a problem.

@glaebhoerl maybe we'd end up making a lint to flag code like let _ = foo();, since under the new semantics for _, that is equivalent to foo(); and should probably be written as such.

@pnkfelix that sounds like a good solution. (In case it's not what you already meant, I'd warn on let _ = anything, because it doesn't make any more sense for non-temporaries than it does for temporaries.)

Edit: If we wanted to be more comprehensive, we could flag any let or match that doesn't have an effect (so, I suppose, ones which contain no bindings at all): so let (_, _) = pair, match triad { (_, _, _) => foo }, and so forth would also generate warnings.

(recorded as #11088)

#3181 and #8194 are related

Is this done already? I was looking into doing it, but didn't find what would need to be changed. Then I tried a few tests, and everything's behaving as if _ already has the new semantics (i.o.w. I didn't manage to provoke any drops). But when I last went looking I didn't see a commit that changed it, though it's possible I missed it. That, and this bug is still open, and presumably whoever fixed it would have closed it. Mystifying matters further, in trans::base::fn init_local I found this:

    if ignore_lhs(bcx, local) {
        // Handle let _ = e; just like e;

but according to git blame...

(Tim Chevalier             2012-08-08 15:36:38 -0700 1129)         // Handle let _ = e; just like e;

...that's from over a year ago!

So what's up with this?

ping @nikomatsakis, with the landing of #11585, can this be closed?

Not sure, let me check, I forget if I removed the code.

Note to @thestinger -- given the temporary lifetime semantics I ended up with, let _ = foo() will run the destructor immediately, but let _ = &foo() will not, so that could be a handy RAII idiom.

So, the code identified by @glaebhoerl is still there and should be removed, but I think the code actually behaves the right way nonetheless. i.e., once the special case for let _ = ... is removed, we will still run the destructor after the let statement, just through a different (and more consistent) path in the code.

Just out of curiosity: is the behavior of assigning to/matching against _ documented anywhere authoritatively? I looked around for a while, but couldn't find anywhere that says what the drop rules are.

I think this is a very good quote, so I'll repeat it here and below:

let _ = ... is commonly used to suppress the must_use warning. — #40096 (comment)


For others like me who found their way here using G**gles (in my case via rust let underscore) – my findings on let _ = ... stuff:

Book link

The updated URL for the ^ above ^ (first edition) link is https://doc.rust-lang.org/stable/book/first-edition/patterns.html#ignoring-bindings.

Example

A very nice example can be found on users.rust-lang.org. I've added the #![allow(...)] line so the code can be run without warnings on the Rust Playground:

struct D(i32);
impl Drop for D {
  fn drop(&mut self) { println!("dropped {}", self.0); }
}

fn main() {
  #![allow(path_statements, unused_variables)]

  {
    D(0);
    let x = D(1);
    let _ = D(2);
    D(3);
    let y = D(4);
    let _ = D(5);
  }

  {
    let var = D(0);
    let _ = var;
    var;
    D(1);
  }
}

(I decided against including the output here so readers get a chance to think about what the output should be.)

Quotes

From the book and #40096 (comment):

It’s worth noting that using _ never binds the value in the first place, which means that the value does not move. [...]

This also means that any temporary variables will be dropped at the end of the statement:

// Here, the String created will be dropped immediately, as it’s not bound:

let _ = String::from("  hello  ").trim();

let _ = ... is commonly used to suppress the must_use warning. — #40096 (comment)

I got here after realizing let _guard = foo.lock(); and let _ = foo.lock(); behave differently. Idiomatic Rust guards tend to hand out access from their lock function to ensure locks are held at the type level, and avoid this problem entirely. Here are two ways to do that and make your code safer:

  1. Define your RAII guard to own the data being locked and return it as a smart pointer in guard.lock():
pub struct FooGuard(Foo);
impl FooGuard {
  fn lock(&self) -> HeldFooLock { // do lock code }
}

pub struct HeldFooLock<'a>(&'a mut Foo);
impl Deref for HeldFooLock<'_> {
  type Target = Foo;
  fn deref(&self) -> &Foo { &self.0 }
}
impl DerefMut for HeldFooLock<'_> {
  fn deref_mut(&mut self) -> &mut Foo { &mut self.0 }
}
impl Drop for HeldFooLock<'_> {
  fn drop(&mut self) { // do unlock code }
}
  1. Have it return a ZST marker that indicates the guard is held, and pass that into any function that needs it. That just means removing the Deref/DerefMut and having HeldFooLock be:
// This struct can only be directly constructed in this module because of the private `()` field
pub struct HeldFooLock(());
impl Drop for HeldFooLock { ... }

impl FooGuard {
  fn lock(&self) -> HeldFooLock {
    // Do lock code;
    HeldFooLock(())
  }
}

pub fn do_operation_that_requires_guard(guard: &HeldFooLock) {}