rust-lang/rust

Tracking Issue for `array_map`

JulianKnodt opened this issue ยท 33 comments

The feature gate for the issue is #![feature(array_map)].

Public API

impl<T, const N: usize> [T; N] {
    pub fn map<F, U>(self, f: F) -> [U; N] where F: FnMut(T) -> U;
}

Steps / History

  • Implement it: #75212
  • Add in place optimization for same size & alignment (?)
  • FCP
  • Stabilization PR #87174

Unresolved Questions

How should order be documented/allowed for map? See #75212 (comment) for discussion.

A possible followup optimization: Perform this in place when size_of::<T>() == size_of::<U>() && align_of::<T>() == align_of<U>(), which can be checked at compile time. Together with RVO that might save a lot of extra stack space on large arrays.

This feature is currently breaking arraymap::ArrayMap without enabling the feature:

error[E0658]: use of unstable library feature 'array_map'
  --> lambda-twist/tests/consensus.rs:25:6
   |
25 |     .map(|&p| Point3::from(p));
   |      ^^^
   |
   = note: see issue #75243 <https://github.com/rust-lang/rust/issues/75243> for more information
   = help: add `#![feature(array_map)]` to the crate attributes to enable

This code started breaking on more recent Rust versions as it seems that it thinks I am attempting to use the map method on array created by this feature, but actually I am using arraymap::ArrayMap::map.

Since I am depending on nightly anyways, I will simply use this feature instead to get around this issue.

Ah I didn't know this crate existed, thanks for the heads up. I'm not quite sure how to resolve this, but will investigate.

That seems like a compiler bug. I think a warning should show, but due to map accepting &self rather than self, it's not being seen. The problem can be seen with the following program:

trait ArrayExt {
    fn map<F: FnMut(i32) -> i32>(&self, f: F);
}

impl ArrayExt for [i32; 1] {
    fn map<F: FnMut(i32) -> i32>(&self, f: F) {}
}

fn main() {
    [4].map(|x| x);
}

In my opinion, this should print the following:

warning: a method with this name may be added to the standard library in the future
  --> src/main.rs:10:15
   |
10 |     [4].map(|x| x);
   |         ^^^
   |
   = note: `#[warn(unstable_name_collisions)]` on by default
   = warning: once this method is added to the standard library, the ambiguity may cause an error or change in behavior!
   = note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
   = help: call with fully qualified syntax `ArrayExt::map(...)` to keep using the current method
   = help: add `#![feature(array_map)]` to the crate attributes to enable `array::<impl [T; N]>::map`

That being said, I'm kinda wondering whether [T; N]::map should use self or &self? Maybe we should have two methods, because they both have their use-cases. &mut self is also possible, albeit that one is kinda niche.

Hm, fair questions. It does usually make sense to take &self since you'd need to make a whole new array of items, but if you want to take ownership of the elements, maybe there is a use case I imagine that probably finding an iterator solution there might be more elegant and is in the works. It would be helpful to post this in the general issues, and link to this Tracking issue for this sort of thing. For &mut self, if you want to modify elements in place, I think it makes most sense to just use a for loop over the elements and modify them that way instead of thru a method.

That being said, I'm kinda wondering whether [T; N]::map should use self or &self? Maybe we should have two methods, because they both have their use-cases. &mut self is also possible, albeit that one is kinda niche.

That should be solved by #75490. Then you can simply as array.each_ref().map(...), for example. That makes it work like Option, where you also only have one map that takes self but also have Option::as_ref and Option::as_mut to work on references to the underlying data instead.

What do you all think about specifying the order in which the closure is called and then already stabilizing this method? I know it hasn't been on nightly for very long, but I don't see what we would want to change about this. It's a pretty straight forward method.

Let's go through the parts of the signature:

  • Name: I'm pretty sure we all agree map is the best choice here.
  • Receiver self, &self or &mut self: as I said in my last comment, array::each_ref and array::each_mut will solve this in the same way it is solved for Option, so self by value seems like the right choice to me.
  • Closure type: FnOnce is out. But maybe Fn instead of FnMut? That's basically only useful if we want to map each element concurrently. But there is currently no similar method in std which internally/secretly uses multiple threads, so I'm pretty sure we don't want that. Apart from that, array lives in core, where we don't even have access to std::thread. Finally, all other map methods take FnMut.

Regarding the order in which the closure is called: I think it is safe to specify the obvious order (index order, starting from 0, going up). We don't work on elements concurrently as mentioned above, so I don't see why we would chose another order.

Thread usage mostly has to do with Send and Sync bound, and I don't think map should require those bounds for its callback. Let's leave multi-threaded array maps to rayon.

FnMut I think makes most sense. Using Fn doesn't prevent mutation anyway, and only will make API more annoying to use in cases where computing the value requires mutation (for example when the values are being memoized) by requiring RefCell usage.

There were some light discussion in #80094 about something like iterators that encode the length. Would that be something to consider before stabilizing this array_map?

It seems that map is not getting as optimized as a loop or a manually unrolled version: https://godbolt.org/z/hsEdqv

It seems that zip is quite annoying: https://godbolt.org/z/4do8nT,
Here's enumerate instead of zip: https://godbolt.org/z/MnrEPo.
That being said, I do have a blocked PR up for optimizing it a little.

I'm also blocked by this hitting stable.

With #82098, the assembly output has improve somewhat: https://godbolt.org/z/hsEdqv (Same link @eduardosm posted above, but Godbolt automatically updates the nightly). It's still not as good as the other two versions though.

Sorry for the noise, please let me know if I should ask this question somewhere else and if so where :)

What is the (or is there a) long term plan when it comes to iter-like methods on [T; N]? Will there be lots of methods added as time goes or will map and generate/from_fn be the only ones? (semi rhetorical question)

My question is If the plan is to make arrays behave a bit like iterators then why not make an iterator-like thing that works with arrays(and other collections of fixed size) instead of tacking the methods directly to [T; N]? I have made a small (WIP) proof of concept iter_fixed of the idea in my post above(happens to depend on array_map).

With that, map would like something like:

let res: [_; 4] = [1, 2, 3, 4]
    .into_iter_fixed()
    .map(...)
    .collect();

a few more lines of code, sure, however it gives access to a whole lot of other methods and code can be made generic over the underlying collection.

Edit: the proposed iter-like methods I know of this far are map, from_fn and zip

Sorry for the noise, please let me know if I should ask this question somewhere else and if so where :)

What is the (or is there a) long term plan when it comes to iter-like methods on [T; N]? Will there be lots of methods added as time goes or will map and generate/from_fn be the only ones? (semi rhetorical question)

My question is If the plan is to make arrays behave a bit like iterators then why not make an iterator-like thing that works with arrays(and other collections of fixed size) instead of tacking the methods directly to [T; N]? I have made a small (WIP) proof of concept iter_fixed of the idea in my post above(happens to depend on array_map).

With that, map would like something like:

let res: [_; 4] = [1, 2, 3, 4]
    .into_iter_fixed()
    .map(...)
    .collect();

a few more lines of code, sure, however it gives access to a whole lot of other methods and code can be made generic over the underlying collection.

Edit: the proposed iter-like methods I know of this far are map, generate/from_fn(cant find the issue) and zip

Most of methods in iter_fixed are unlikely to be introduced at this point as they would need to do computations to calculate new length of array and that is not allowed on stable.

@xfix

Most of methods in iter_fixed are unlikely to be introduced at this point as they would need to do computations to calculate new length of array and that is not allowed on stable.

Not sure I believe that should be a problem (even if there might be other reasons not to do it):

Only 5/12 methods affect the length

Out of the methods

  • map
  • inspect
  • skip
  • step_by
  • chain
  • enumerate
  • take
  • zip
  • rev
  • copied
  • cloned
  • flatten

only these methods change the length(and could probably be added later when const generics does support that):

  • skip
  • step_by
  • chain
  • take
  • flatten

I believe the rest of the methods should work fine right now :)

Is there likely to be appetite for a try_map? It's the sort of thing that's quite difficult to implement by hand safely.

Edit: the iter_fixed suggestion is really nice and would cover this use-case very elegantly.

Edit2: Alternatively, a TryFromIterator trait could be added and implemented for [T; N], with the ability to do my_iterator.try_collect::<[T; N]>().unwrap() to get back an array.

I like the idea of iter_fixed, but I think this shouldn't really block stabilization of array::map. map is such a fundamental function that is useful in a lot of situations. In some projects I use it all the time. An additional into_iter() plus collect() (or something like that) would add noise to lots of code.

I agree we don't want to add tons of methods to arrays that would better fit a "const length iterator" trait. But again, I think map is so fundamental, it's a good idea to add it as a inherent method. And I don't really see any other blockers, so I would like to โ€“ again โ€“ propose to stabilize array::map.

CryZe commented

One thing I'm concerned about is if you end up using multiple methods, such as array.filter(...).map(...) then each method will have to produce a whole new array every time, which likely is pretty bad for performance. Unless LLVM can somehow optimize this (which I don't believe it can, but maybe somebody can prove this to be wrong), then I believe these methods to directly be on arrays rather than on an iterator-like chain, which can specialize on it, to be a sneaky performance pitfall.

One thing I'm concerned about is if you end up using multiple methods, such as array.filter(...).map(...)

How would filter even work in this context? I assume we'd only want combinations that preserve the array length.

CryZe commented

filter was a bad example, but still any chain like this should be really bad for performance.

I like the idea of iter_fixed, but I think this shouldn't really block stabilization of array::map. map is such a fundamental function that is useful in a lot of situations.

Agreed!

Regarding the order in which the closure is called: I think it is safe to specify the obvious order

Agreed. We already guarantee this for other similar functions. And even in all the cases where we don't guarantee it, people already rely on the obvious order. As you already said, we don't gain anything by leaving it unspecified.

And I don't really see any other blockers, so I would like to โ€“ again โ€“ propose to stabilize array::map.

The generated code isn't always optimal right now, but that doesn't have to block stabilization. We can always keep improving the implementation.


Proposing to stabilize this and add the ordering guarantee.

@rfcbot merge

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

CryZe commented

The generated code isn't always optimal right now, but that doesn't have to block stabilization. We can always keep improving the implementation.

I'm pretty sure that the performance pitfall is not something that can in any way ever be mitigated other than through possibly clippy lints, which however is maybe good enough.

Are you talking specifically about large arrays? For an [u8; 8].map(..).map(..).map(..) it's no different than just applying three functions to a u64 or something. Moving or copying large arrays has always been a bit of a performance pitfall in Rust. Some of these pitfalls were avoided because we only had implementations up to a length of 32, but that's no longer the case. A warning (from clippy or rustc) about moving/copying large arrays might be useful. I don't think we need to block array.map() on it, because it's not a problem specific to map.

CryZe commented

No, this has nothing to do with moving (that's a separate issue, that can indeed be fixed). Instead it has something to do with the fact that the maps can't ever "merge together" as they would in a normal iterator chain. It is forced to fully complete the first map before starting the second, meaning that it is forced to do 3 entire loops over the array as opposed to just a single loop with the 3 closures merged and therefore optimized together. LLVM may be smart enough to at some point (atm it definitely doesn't do this) realize that the 3 closures are order independent and do the optimization anyway, but the moment either of those do anything that's not order independent, no optimization will ever be possible.

Overall I guess it's fine merging this. There are many other ways to write slow code in Rust. This just needs a few lints and that's it.

@CryZe I just played with a few examples and looked at the generated assembly. E.g. see this: https://rust.godbolt.org/z/M1KTbYfqv

For some examples, LLVM really doesn't perform too well. I wasn't quite aware of that, so thanks for bringing that up again. But note that for smallish arrays, LLVM seems to unroll all loops. And in those cases, it seems to perform quite well and combine the different operations. In the example above, the threshold is somewhere between 32 and 50. And I would expect a considerable amount of array::map uses cases to be about (very) short arrays, e.g. [f32; 3], [f32; 4] and [[f32; 4]; 4] being common in computer graphics (and the 4x4 matrix is already basically the largest example I can think of).

And while I don't know anything about how LLVM works internally, I feel like there is nothing fundamental about the design of array::map keeping LLVM from better optimizing this, right?

All that said, I think a note about this possible performance pitfall in the docs for array::map is probably warranted. But yeah, I think the number of use cases that won't encounter this optimization problem still vastly outnumbers the ones that do. So I still think stabilization is already a good idea.

๐Ÿ”” This is now entering its final comment period, as per the review above. ๐Ÿ””

Even if this method is useless in 99% of cases, it lets people write default arrays. So for that alone it's worth it.

And the 99% figure is itself hyperbolic.

I don't know if this is the right place for this, but I'd like to report that array::map uses the stack excessively and operating on larger arrays result in unexpected stack overflows.

Minimal example: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=2d96e8f66b3432a8d1ecf0ca4331f85a

In the example, at some point more than 16 times more stack is used than the original array size.

As far as I've looked into the actual code, the only thing that stood out to me is use of core::mem::transmute_copy instead of core::mem::transmute in core::array::IntoIter::new, but I doubt it's the reason for this.

Works in release at least.

In general, Rust debug performance has problems like this all the time, and can hit stack size limits easily. Particularly on embedded for example.

I think Rust can work on this problem, but it's not specifically the fault of array_map.

created #86912 to track that stack usage issue

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.