tc39/proposal-array-filtering

Suggestion to add Array#partition as well

MaxGraey opened this issue ยท 36 comments

Examples

Basics:

const a = 1, b = 2, c = 3, d = 4
const [ab, cd] = [a,b,c,d].partition((_, index) => index >= 2)
const [ac, bd] = [a,b,c,d].partition(value => value % 2 === 0)

Immutable (out of place) Quick Sort:

function quickSort([p, ...tail]) {
  if (!tail.length) return [p, ...tail]
  const [l, r] = tail.partition(x => x > p)
  return [...quickSort(l), p, ...quickSort(r)]
}

quickSort([5, 2, 3, -1])

Efficient and correct implementation usually not trivial:

// not trivial
function partition(arr, callback) {
  return arr.reduce((result, value, index, array) => {
    result[callback(value, index, array) ? 1 : 0].push(value);
    return result;
  }, [[], []]);
}

or

// not efficient (twice linear scan)
function partition(arr, callback) {
  return [
    arr.filter((...args) => !callback(...args)),
    arr.filter(callback)
  ];
}

or based this proposal:

function partition(arr, callback) {
  return [
    arr.filterOut(callback),
    arr.filter(callback)
  ];
}

Related

OCaml: List.partition
Haskell: Data.List.partition
Lodash: _.partition
RxJS: RxJS/partition
Kotlin: Array.partition

For the note, underscore _.partition and lodash _.partition are also exist (16,309 downloads weekly for lodash version)

Thanks added to Related

Also RxJS

Hi @MaxGraey, I think partition needs to be its own proposal. It's certainly related to select/reject, but it's different enough that we would have to argue over semantics delaying the whole process.

Reopening to discuss. @MaxGraey, do you know of any libraries/projects/etc that are using partition?

/cc @michaelficarra

edit: Sorry, I didn't see that these were mentioned in earlier comments.

@michaelficarra yup. Are you aware of projects using them, though?

@jridgewell we use partition at my work rather extensively, typically around usage of https://github.com/graphql/dataloader, which can often provide arrays of the type (T | Error)[]. Being able to partition errors separately from non-errors is a typical pattern for us.

The source is not available for this usage, but here are some real-life examples that I've copied:

const [rejectedReviews, approvedReviews] = partition(allReviews, (review) => review.rejectionReasonCode != null);
// Type-safety of this function ensures well-typed `errors` and `items` below
function isError(candidate: unknown): candidate is Error {
  return candidate instanceof Error;
}
const itemsAndErrors = await dataLoader.loadMany(ids);
const [errors, items] = partition(itemsAndErrors, isError);

@ckknight: Thanks!

I added _.partition data from the HTTP Archive and Github Archive, see #4. Usage is at ~1.5% of _.filter usage (which is a little less than _.select).

btw lodash.partition has33k downloads/week

I guess query exactly_.partition in db isn't fully represent all cases like:

const partition = require('lodash.partition');
const { partition } = require('lodash');

and etc

As a radical proposal, what about implementing Array#partition instead of Array#filterOut? Users who only desire the falsy partition can destructure the output, and only consume the second value.

const [, cd] = [a,b,c,d].partition((_, index) => index >= 2)
// is equivalent to...
const cd = [a,b,c,d].filterOut((_, index) => index >= 2)

A sufficiently-clever implementation should ideally be able to recognize this situation, and internally fall back to a more-efficient filter or filterOut algorithm to inhibit any major performance penalties over a standalone filterOut method.

@schmod Good point!
Btw I propose reversed order result as [filterOutResult, filterResult] which mean it will even more simple:

const [ab] = [a,b,c,d].partition((_, index) => index >= 2);
// which equivalent to
const ab = [a,b,c,d].filterOut((_, index) => index >= 2);

As a radical proposal, what about implementing Array#partition instead of Array#filterOut? Users who only desire the falsy partition can destructure the output, and only consume the second value.

I find partition to be particularly clunky. Having to grab the either the first or second array, depending on which you care about, seems much less ergonomic than just doing filter or filterOut to get the result.

Then, is it [falsey, truthy], or [truthy, falsey], because it's not self explanatory. Lodash has a similar (but more generic) form with _.groupBy, which would return a object with keys returned by the cb and arrays as values. That seems more useful than partition.

Then there's also the query data. Use of partition is relatively small at ~2%, while reject 10-25%, and filter at 70-80%.

While partition is still an option for the proposal, I don't think it's the best one.

I think groupBy is a great alternative. A partition implementation that returns a { true, false } record is simply a special case that calls ToBoolean on the result. I think either one would satisfactorily solve the problem this proposal is trying to address.

That's true. partition is a specialized case of groupBy. But filterOut also specialized case of filter. Let's compare.
This:

const lessThan2 = arr.filter(val => !(val >= 2));
// or const [lessThan2] = arr.groupBy(val => Number(val >= 2));

much less expressive than

const lessThan2 = arr.filterOut(val => val >= 2);

same as:

const [lessThan2, greaterOrEqThan2] = arr.groupBy(val => Number(val >= 2));

less expressive than

const [lessThan2, greaterOrEqThan2] = arr.partition(val => val >= 2);

You have to sort of intricately "know" what order the tuple returns the values in, it might be better to return a struct:

{
    in,
    out
}

ex:

const {
    in: greaterThanOrEq2,
    out: lessThan2
} = arr.partition(val => val >= 2);

Although, thinking about it, in and out could be ambiguous...

Thinking about it, in and out could be ambiguous...

Oh, then how about included and excluded?

Thinking about it, in and out could be ambiguous...

it also contradicts the generally accepted conventions already in JS:
Array#entries, Map#entries, Set#entries. All this return array like tuples.

Main advantages of array tuple you could always skip unnecessary part:

const [, cd] = [a,b,c,d].partition((_, index) => index >= 2)
const [ab] = [a,b,c,d].partition((_, index) => index < 2)

but also you could this:

const { 1: cd } = [a,b,c,d].partition((_, index) => index >= 2)
const { 0: ab } = [a,b,c,d].partition((_, index) => index < 2)

with objects:

const { out: cd } = [a,b,c,d].partition((_, index) => index >= 2)
const { in: ab } = [a,b,c,d].partition((_, index) => index < 2)

Main advantages of array tuple you could always skip unnecessary part:

You can do the same with the struct, just don't use the other property from the struct.

const {
    in: greaterOrEqThan2,
    out: lessThan2
} = arr.partition(val => val >= 2);

vs

const { out: lessThan2 } = arr.partition(val => val >= 2);

Array#entries, Map#entries, Set#entries. All this return array like tuples.

They return arrays of key-value pairs, in contrast, here would you want to say that there is an association between what was filtered in and out? I would say "no," because we are intentionally trying to separate them into two totally separate categories, therefore you wouldn't want to pair them.

But actually, returning a tuple seems more like the opposite of Array#flat, and it shows that the array has been partitioned into two separate sections, while still being an array. So I guess it can be seen either way.

Regardless, I personally would support Array#partition over Array#filterOut, it is more reusable and offers a utility that neither Array#filterOut, nor Array#filter provide. I have made a function that explicitly looped over arrays a few times in order to obtain the behavior of the Array#partiton before.

How about getting a TypedArray#partition too?

I think partition could be generalize more and include slide, stride and chunk features. See this talk:
https://www.youtube.com/watch?v=-_lqZJK2vjI

I've several times done stuff like this, so it may be worth doublechecking for that kind of pattern:

const foos = items.filter(item => item.type === "foo")
const bars = items.filter(item => item.type !== "foo")

And of course there's several ways this could be expressed, each of which I've seen personally in the wild:

  • filter(f) + reject(f)
  • filter(f) + filter(i => !f(i))
  • filter(x => x.type === "a") + filter(x => x.type === "b") where the two are mutually exclusive

@isiahmeadows I'm not sure what you're trying to point out, can you clarify?

Your code example is equivalent to the following:

const partitioner = item => item.type === "foo";

const foos = items.filter(partitioner);
const bars = items.filter(item => !partitioner(item));

Which is the current problem that we would like to fix.

filter(f) + reject(f)

I don't get this right away, can you give an example?

where the two are mutually exclusive

As long as they are mutually exclusive, the original proposal should work just fine.

As for your example, would you want a method that takes two comparer functions?

const [
    foos,
    bars
] = items.partition(
    item => item.type === "foo",
    item => item.type === "bar"
);

I feel like that could easily be extended to accept N functions, and return an array of N, storing all values that the function returned a truthy value for, but really seems like it should be a user-implemented library solution.

I don't get this right away, can you give an example?

@isiahmeadows is demonstrating that they needed both the "filtered out" and the "filtered in" at the same time, which partition solves. It's a +1 that partition is useful.

I feel like that could easily be extended to accept N functions

This seems a bit more complicated than a generic groupBy. The ordering of the functions would affect the output, where groupBy could easily handle the cases.

@jridgewell hey, maybe already makes sense to add it to the proposal? Seems it's really required and the community wanna it.

Seems it's really required and the community wanna it.

Absolutely not. It overcomplicates the algorithm, the understanding of purpose, and overall documentation of this hypothetical method.

partition, like the other Array.prototype methods, should be as simple as possible, but no simpler.

Please, let's not overcomplicate things. It will be difficult to know when to teach, when to use, and whether it's efficient enough to bother with. If you want to worry about those things, then use an NPM package. I don't want to see that canonicalized within ECMAScript.

@ckknight for the previous 5 months .partition logic was required for me at least 5-10 times. NPM packages for everything that should be in the standard library is bullshit - I don't wanna depend on a separate non-standard script for a simple action (hi, left-pad hell).

@ckknight I remember how this approach is "useful" on "leftpad" example

@ckknight I remember how this approach is "useful" on "leftpad" example

I am in favor of having a simple, useful partition method, on the official Array.prototype within the ECMAScript standard. I am opposed to overcomplicating the algorithm and signature for it.

@ckknight so why do you think that I mean something else?

@ckknight The algorithm would require about 3 more lines of ES spec text, compared to filter/filterOut, one line to call an abstract operation to get the array species constructor, an extra variable declaration using that constructor, the line with "let k be k + 1" can be removed, as it is now always pushing to an array, and the creation of the structure to return.

Regardless, the current Array#filter discards useful data that developers intend to use. Array#filter feels broken, there's no question about it.

I also needed a groupBy recently: https://github.com/babel/babel/pull/13021/files#diff-3c2e3e5a9cc9bdb04de11117cfaedee2a2da03910480e9186c252a62fff7ce27R31-R42

hey, maybe already makes sense to add it to the proposal?

Yah, I can bring up both at the next meeting.

I also prefer groupBy to partition, even in cases where I just want to split into two buckets, for the reason @jridgewell gives above: groupBy means not having to remember which entry is which. Given that the genesis of this whole proposal is that people get the behavior for filter backwards, it seems better to go with the API without this issue, especially given that it's strictly more powerful.

The spec text currently includes groupBy even though the README doesn't mention it, is it in or out?

groupBy was welcomed by the committee, and is being moved to https://github.com/tc39/proposal-array-grouping.