[Question] : Performance of combine.

Question

[Question] : Performance of combine.

LeakStephane opened this issue 8 months ago · 6 comments

Hey, first let me thank your library which is overall wonderful.

Issue :

I run into trouble regarding the Result.combine method.
I'm basically trying to combine a relatively "huge" array around 300_000 elements of Result<T, E>.
Sadly doing so takes around 1 minutes on my computer.

Question :

Is combine implementation ready to handle that size of array ?

P.S : For now i will just not use neverthrow on array that are too big.

Answer 1 · 2024-05-02T05:16:36.000Z

Oh wow, that's an interesting use case.

So the implementation of Result.combine is relatively straightforward.

There are two issues I see immediately (with respect to large result lists):

O(n) runtime due to usage of resultList.reduce
- this is def not the main problem though, as looping 300,000 elements on my machine on a for loop took about 4 seconds
appendValueToEndOfList is creating a new array on each iteration 😨

The latter issue is almost certainly the problem.

That being said, as I've mentioned here, I don't have time to contribute to the codebase.

That being said though I would be happy to accept a fix for this - and I'll devote time to reviewing the PR.

Answer 2 · 2024-05-02T05:19:05.000Z

Also, ...list is expanding the previous list (in addition to the problem of creating a brand new array). This is also very likely another cause of the performance slowdown.

Answer 3 · 2024-05-02T05:21:06.000Z

On another note; given such a large list, would you not want some sort of function that partitions your results into Ok values and Err values?

e.g.

const [okList, errList] = Result.partition(veryLargeList)

Answer 4 · 2024-05-04T08:47:14.000Z

Hey, thank for you detailed answer !

That being said, as I've mentioned #531, I don't have time to contribute to the codebase.

Yeah, i saw the message. I'm glad that your are transparent on the time you can dedicate to neverthrow. 👍

Also good luck in your new startup adventure, i know from experience that it's time consuming 😉

That being said though I would be happy to accept a fix for this - and I'll devote time to reviewing the PR.

I will definitively have a look, but no promises 😃

On another note; given such a large list, would you not want some sort of function that partitions your results into Ok values and Err values?

Yeah, could be nice, should be useful in my case.

(This is a personal opinion)

That being said, in general i'm not in favor of expanding the interface, it creates confusion and, it's harder to learn over time.

I need to check but it might be doable to do some kind of partition without relaying on a method implemented by neverthrow.

Overall nice pinpointing of the issue 👍

Answer 5 · 2024-05-23T19:50:27.000Z

Another issue with the current implementation, that isn't directly connected to this issue though, is that even when isErr() returns true, the reduce will keep iterating over the entire input list. So not really short circuiting but only returning err(result.error) n times before it finally finishes all elements.

I played around with the implementation and this here is magnitudes faster on my machine.

this implementation doesn't create new arrays in every iteration but mutates one array a bunch of times.
I am not too familiar with this package yet so i am not sure if I am misusing something here though.

export const combineResultListFast = <T, E>(
  resultList: readonly Result<T, E>[],
): Result<readonly T[], E> => {
  let acc = ok([]) as Result<T[], E>

  for (const result of resultList) {
    if (result.isErr()) {
      acc = err(result.error)
      break // short circuiting here
    } else {
     acc.map((list) => list.push(result.value))
    }
  }
  return acc
}

i benchmarked it with a couple of different input sizes and this is the result of the microbenchmark on my machine:

┌─────────┬─────────────┬─────────────────────┬─────────────────────┬────────────────┬─────────────┐
│ (index) │ numElements │ timeOld             │ timeNew             │ areResultsSame │ improvement │
├─────────┼─────────────┼─────────────────────┼─────────────────────┼────────────────┼─────────────┤
│ 0       │ 100         │ 0.08329199999997172 │ 0.09158300000001418 │ true           │ '0x'        │
│ 1       │ 1000        │ 2.159749999999974   │ 0.132000000000005   │ true           │ '16x'       │
│ 2       │ 10000       │ 53.110708999999986  │ 0.5692910000000211  │ true           │ '93x'       │
│ 3       │ 100000      │ 9115.303124999999   │ 2.2292080000006536  │ true           │ '4089x'     │
│ 4       │ 500000      │ 310282.093833       │ 11.678125000034925  │ true           │ '26569x'    │
└─────────┴─────────────┴─────────────────────┴─────────────────────┴────────────────┴─────────────┘

You can check out the code as well as the benchmark code here:
function implementation
bechmark

Answer 6 · 2024-05-24T08:17:55.000Z

Addionally, it might also make sense to refactor the combineResultListWithAllErrors() function to not use reduce + spread as reduce and spread in combination is always slow for large data sets in comparison with regular loops and array mutation. See this jsperf benchmark: https://jsperf.app/noyulo

export const combineResultListWithAllErrors = <T, E>(
  resultList: readonly Result<T, E>[],
): Result<readonly T[], E[]> => {
  let acc = ok([]) as Result<T[], E[]>

  for (const result of resultList) {
    if (result.isErr() && acc.isErr()) {
      acc.error.push(result.error)
    } else if (result.isErr() && acc.isOk()) {
      acc = err([result.error])
    } else if (result.isOk() && acc.isErr()) {
      // do nothing
    } else if (result.isOk() && acc.isOk()) {
      acc.value.push(result.value)
    }
  }
  return acc
}