rust-lang/rust

TrustedRandomAccess optimization for Zip containing vec::IntoIter is unsound with how specialization currently behaves around HRTB fn pointers

steffahn opened this issue · 20 comments

Related to #85863, which now includes a high-level description covering both #85863 and #85873. [This issue is not a duplicate because both issues can be fixed independently in different ways.]

use std::{iter::Zip, marker::PhantomData, vec};

struct MyStruct<T>(PhantomData<T>);
impl<T> Clone for MyStruct<T> {
    fn clone(&self) -> Self {
        Self(PhantomData)
    }
}
impl Copy for MyStruct<fn(&())> {}

#[allow(clippy::type_complexity)]
fn main() {
    let i: vec::IntoIter<MyStruct<fn(&())>> = vec![MyStruct::<fn(&())>(PhantomData)].into_iter();
    let mut y = [Some(&42)];
    let mut z = i.zip(y.iter_mut());
    let r1 = z.next().unwrap().1;
    let mut z: Zip<vec::IntoIter<MyStruct<fn(&'static ())>>, _> = z;
    let r2 = z.next().unwrap().1;
    let elt = r1.as_ref().unwrap();
    dbg!(elt);
    *r2 = None;
    dbg!(elt);
}
   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.10s
     Running `target/debug/playground`
[src/main.rs:20] elt = 42
[src/main.rs:22] elt = timeout: the monitored command dumped core
/playground/tools/entrypoint.sh: line 11:     8 Segmentation fault      timeout --signal=KILL ${timeout} "$@"

(playground)

@rustbot label T-libs-impl, T-compiler, A-specialization, A-iterators, A-typesystem, A-lifetimes, A-traits, regression-from-stable-to-stable
someone please add the unsound label thanks

Explanation

Zip uses an optimization if both contained iterators implement TrustedRandomAccess. There’s an impl

impl<T> TrustedRandomAccess for IntoIter<T>
where
    T: Copy, 

for both vec::IntoIter and vec_deque::IntoIter that depend on Copy. This way, unsound specialization is possible. This can be turned into actual ill-behaved programs at runtime similar as in #85863, relying on covariance of IntoIter and Zip. This way, the ZipImpl implementation is switched out and this way the Zip iterator gets into an illegal state resulting in violation of the contract of TrustedRandomAccess by calling .next() on the IterMut iterator after the first item was already read via __iterator_get_unchecked. The best immediate fix is probably to remove those two TrustedRandomAccess implementations for IntoIter; they’re an optimization only, anyway. This distinguishes this issue clearly from #85863 because for the FusedIterator trait, the specialization is quite directly part of the API, whereas this issue is only about a soundness regression from a performance optimization that can easily be reverted.

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group, issue will be discussed during T-compiler meeting

#85863 (comment)

Another possible fix for Fuse I just thought of, but that one would require new language features, would be to make Fuse only use its optimized implementation if T and all its subtypes implement FusedIterator. Some way of writing a bound for<U: T> U: FusedIterator would be needed.

Applying the same idea to Zip (i.e. #85873) could work, but I think this would need a TrustedRandomAccess bound for all subtypes and all supertypes. As long as TrustedRandomAccess is not a public trait, this can also be done in the future after a short-term fix like #85874 has been implemented.

Would removing the specialization from Zip be a solution or is its existence on IntoIter itself also an issue? There has been some discussion around introducing a new zip_exact (#85342) that's less fragile to specialize compared to the current Zip.

AFAIK (correct me if I’m wrong), TrustedRandomAccess is currently exclusively used for specializing Zip. So the problem will trivially be solved by removing its only use case.

It's at least used here

impl<T, I> SpecInPlaceCollect<T, I> for I
where
I: Iterator<Item = T> + TrustedRandomAccess,
{
#[inline]
fn collect_in_place(&mut self, dst_buf: *mut T, end: *const T) -> usize {
let len = self.size();
let mut drop_guard = InPlaceDrop { inner: dst_buf, dst: dst_buf };
for i in 0..len {
// Safety: InplaceIterable contract guarantees that for every element we read
// one slot in the underlying storage will have been freed up and we can immediately
// write back the result.
unsafe {
let dst = dst_buf.offset(i as isize);
debug_assert!(dst as *const _ <= end, "InPlaceIterable contract violation");
ptr::write(dst, self.__iterator_get_unchecked(i));
drop_guard.dst = dst.add(1);
}
}
mem::forget(drop_guard);
len
}

and I also intend to use it in other places e.g. #83770 although that particular one had to be reverted since performance initially looked good but turned out mixed later.

oh… right, I might have only grepped for ': TrustedRandomAccess' 😅

Great. So this second use case ignores the documentation of TrustedRandomAccess by calling as_inner on the iterator after a call to __iterator_get_unchecked, even though SourceIter::as_inner is not in the list of allowed methods. I guess the documentation of TrustedRandomAccess would have to be updated to account for this?

And while we’re at it, next_back() on Take calls DoubleEndedIterator::nth_back and ExactSizeIterator::len. Neither of these is allowed according to TrustedRandomAccess docs.

Ah, I see you’re personally involved whith the (: Copy-bound) …:IntoIter: TrustedRandomAccess implementations, the Take implementation and the collect_in_place use-case, so I’ve got the right person to talk to ^^

@the8472

The collect_in_place use case doesn’t have any problem (besides the fact that as_inner is called, for which I haven’t checked yet if it can be added to the list of allowed methods without any problems) because there is no state that could be invalidated.

as_inner comes from its own unsafe trait which is fairly restrictive in what you can do with it, but indeed the interaction should be better documented.

I'm not even sure if enumerating which methods are safe to call and in which order is enough, the constraints are more subtle than that as we discussed in #85888 (comment)
The handwavy idea is that once you start using __iterator_get_unchecked iterator sources and iterator adapters will be bypassed and not have their state updated even though elements have been extracted as if the interator had been advanced, so calling any method that would rely on the correctly updated state afterwards may be incorrect (and unsound) but state is sometimes (but not always) split between the front and back, so the invalidation can happen independently. And which methods rely on state updates and which don't depends on your iterator source and the intervening adapters, although next and next_back are generally affected.

And while we’re at it, next_back() on Take calls DoubleEndedIterator::nth_back and ExactSizeIterator::len. Neither of these is allowed according to TrustedRandomAccess docs.

Great points. I'm not sure if those are actually safe. Calculating len usually itself uses size_hint under the hood just as TRA's size would, but the size calculation might be wrong if something used __iterator_get_unchecked in the forwards direction. nth_back is even more tricky since it uses advance_back_by by default which may be a target for future optimizations.

Ah, I see you’re personally involved whith the (: Copy-bound) …:IntoIter: TrustedRandomAccess implementations, the Take implementation and the collect_in_place use-case, so I’ve got the right person to talk to ^^

It's good to have an extra pair of eyeballs, thank you! This stuff is difficult and I have obviously missed several things.

The collect_in_place use case doesn’t have any problem (besides the fact that as_inner is called, for which I haven’t checked yet if it can be added to the list of allowed methods without any problems) because there is no state that could be invalidated.

Yeah, but removing TrustedRandomAccess renders it unused since it always has to come from a IntoIter. It's not critical since it's mostly an optimization of an optimization.

In my opinion the real source of grief is Zip. It tries to do so many things at once, especially the side-effect preservation and double-ended iteration and trusted random access (internally and pass-through). E.g. the need for allowing next_back() after __iterator_get_unchecked has been called is only needed due to the side-effects part.

So we can definitely add more test cases, try to fix what is broken or rip out the optimizations that I introduced erroneously but I'd like to explore #85342 as an alternative and also get the libs team's opinion on that. It does seem easier to reason about optimizations there.

We might even be able to leave __iterator_get_unchecked on Zip but not implement next or next_back in terms of it. That way it could be used by other specializations without specializing itself.

It’s all a question of what the documentation of TrustedRandomAccess settles on. Currently it does not explicitly allow coercing to subtypes, so presumably Zip is the offender being a type with covariant parameters, and the IntoIter implementations are sound. In the PR #85874 I’ve added the necessary changes to the documentation to make Zip (more) sound and the IntoIter implementations unsound. #85874 (comment)

especially the side-effect preservation

Haha, regarding those, in my mind I’m already picturing ways to fix the remaining issues around that point (e.g. #82303) by extending TrustedRandomAccess even further.

By the way, regarding side-effects, here’s a thing I just came up with:

https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=5d77d5665ea0c81a9bca62f692eda53c

Actually, that’s not even on stable yet, so kind-of a beta regression?

especially the side-effect preservation

Haha, regarding those, in my mind I’m already picturing ways to fix the remaining issues around that point (e.g. #82303) by extending TrustedRandomAccess even further.

Well, I am curious about what you cook up. TRA was originally intended to allow Zip optimization and nothing else, but in principle it's also useful for optimizations in other places. So there's some tension about tailoring it just for Zip's current needs vs. making it something general for random access without the zip idiosyncrasies.

Theoretically we could have separate TrustedZipAccess and a TrustedRandomAccess traits and use them for different optimizations, but usually those squeeze out only <=1% improvements on perf.rlo and maybe 50% improvements on a few microbenchmarks (and O(n) -> O(1) in one particular one) so it's probably not worth that maintenance burden.

Tangentially, tried adding a fixup method that could be used to bring an iterator back into a clean state after __iterator_get_unchecked had been used, which would allow TrustedRandomAccess to be implemented unconditionally (i.e. for Drop types too). But it turned out that this would either require impl Drop for Zip (breaking change) or a call to the fixup method in each next/next_back call (defeating optimizations on external iteration, only leaving them for internal iteration)

By the way, regarding side-effects, here’s a thing I just came up with:

https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=20162c689badc919b523428d5d0d24b3

Actually, that’s not even on stable yet, so kind-of a beta regression?

I would call it a new way to trigger #82303 caused by #83990

Note that I lean heavily in favor of allowing the elimination of iterator side-effects one way or another where possible as long as it still produces the same final result. But yeah, I can understand if someone wants to treat this as a regression.

@the8472

It's at least used here

impl<T, I> SpecInPlaceCollect<T, I> for I
where
I: Iterator<Item = T> + TrustedRandomAccess,
{
#[inline]
fn collect_in_place(&mut self, dst_buf: *mut T, end: *const T) -> usize {
let len = self.size();
let mut drop_guard = InPlaceDrop { inner: dst_buf, dst: dst_buf };
for i in 0..len {
// Safety: InplaceIterable contract guarantees that for every element we read
// one slot in the underlying storage will have been freed up and we can immediately
// write back the result.
unsafe {
let dst = dst_buf.offset(i as isize);
debug_assert!(dst as *const _ <= end, "InPlaceIterable contract violation");
ptr::write(dst, self.__iterator_get_unchecked(i));
drop_guard.dst = dst.add(1);
}
}
mem::forget(drop_guard);
len
}

and I also intend to use it in other places e.g. #83770 although that particular one had to be reverted since performance initially looked good but turned out mixed later.

Maybe a reasonable approach is, especially if #85874 happens to get merged, to have two traits, one version of TrustedRandomAccess that allows coercion to subtypes, and one that doesn’t. Your : Copy dependent implementations could then be re-introduced to the version that doesn’t allow the subtype coercions, and the use cases you listed that don’t need subtype coercion could specialize on the subtype-coercion-free version of the trait.

Something like TrustedRandomAccessNoCoerce with a glob impl TrustedRandomAccessNoCoerce for T where T: TrustedRandomAccess. The documentation could just specify “the same safety requirements as TrustedRandomAccess but without the ones about coercion to subtypes”.

Something like TrustedRandomAccessNoCoerce with a glob impl TrustedRandomAccessNoCoerce for T where T: TrustedRandomAccess.

That wouldn't actually help since this trait needs to be propagated through iterator adapters.

If we have an instance of Inspect<Map<vec::IntoIter>> and impl TrustedRandomAccessNoCoerce for IntoIter<T> where T: Copy then the adapter chain wouldn't become TrustedRandomAccessNoCoerce as it only has impl TrustedRandomAccess for Map<I, F> where I: TrustedRandomAccess which would not be true for IntoIter

So we would have to put an additional marker on each adapter.

The marker traits seem to be proliferating. I was also thinking about separating out the MAY_HAVE_SIDE_EFFECT into its own trait because there are a few optimizations that only need TrustedLen + knowing about side-effects rather than TrustedRandomAccess.
It would be nice if we could somehow encode iterator capabilities via consts instead, not sure if that's feasible. Or maybe macros for the most common cases.

Good points. Maybe everything can become const …: bool where Self: Sized flags on the Iterator trait itself? 🤔 😅

Edit: Well, these would need to be considered and explicitly written for every adapter, too. And I just noticed that Self: Sized on a const isn’t a thing?

An unstable + hidden const __ITERATOR_TRUSTED_RANDOM_ACCESS: bool flag on Iterator and another
const __ITERATOR_TRUSTED_RANDOM_ACCESS_ALLOW_COERCION: bool flag. And __ITERATOR_MAY_HAVE_SIDE_EFFECTS on Iterator as-well.

And I just noticed that Self: Sized on a const isn’t a thing?

Right, either passing capabilities through or setting a flag unconditionally would be easy enough, but evaluating more complex conditions based on type bounds would be tricky. It could probably be done with specialization and const functions but could get verbose quickly, thus negating most conciseness wins.

Having all capabilities in a single trait might still be useful in so far that the final consumers of that trait, i.e. the specializations, would only have to specialize on trait unsafe IteratorCapabilities and then use const branching on those flags to switch between several specialized impls thus avoiding the overlapping specialization problems. But but we already have workarounds for those, so it's not a huge win.
And it might also negatively impact codegen performance if there are several branches of dead code and they only get eliminated post-monomorphization or in llvm.

So we probably have to stick to separate marker traits for the time being. Maybe they can be consolidated in the future when const generics and specialization become more powerful.

removing T-compiler as this seems more in T-libs ballpack (Zulip thread)

@rustbot label -T-compiler