[Question] : Performance of combine.
LeakStephane opened this issue ยท 6 comments
Hey, first let me thank your library which is overall wonderful.
Issue :
I run into trouble regarding the Result.combine
method.
I'm basically trying to combine a relatively "huge" array
around 300_000 elements of Result<T, E>
.
Sadly doing so takes around 1 minutes on my computer.
Question :
Is combine
implementation ready to handle that size of array ?
P.S : For now i will just not use neverthrow
on array that are too big.
Oh wow, that's an interesting use case.
So the implementation of Result.combine
is relatively straightforward.
There are two issues I see immediately (with respect to large result lists):
- O(n) runtime due to usage of
resultList.reduce
- this is def not the main problem though, as looping 300,000 elements on my machine on a for loop took about 4 seconds
appendValueToEndOfList
is creating a new array on each iteration ๐จ
The latter issue is almost certainly the problem.
That being said, as I've mentioned here, I don't have time to contribute to the codebase.
That being said though I would be happy to accept a fix for this - and I'll devote time to reviewing the PR.
Also, ...list
is expanding the previous list (in addition to the problem of creating a brand new array). This is also very likely another cause of the performance slowdown.
On another note; given such a large list, would you not want some sort of function that partitions your results into Ok
values and Err
values?
e.g.
const [okList, errList] = Result.partition(veryLargeList)
Hey, thank for you detailed answer !
That being said, as I've mentioned #531, I don't have time to contribute to the codebase.
Yeah, i saw the message. I'm glad that your are transparent on the time you can dedicate to neverthrow
. ๐
Also good luck in your new startup adventure, i know from experience that it's time consuming ๐
That being said though I would be happy to accept a fix for this - and I'll devote time to reviewing the PR.
I will definitively have a look, but no promises
๐
On another note; given such a large list, would you not want some sort of function that partitions your results into Ok values and Err values?
Yeah, could be nice, should be useful in my case.
(This is a personal opinion)
That being said, in general i'm not in favor of expanding the interface, it creates confusion and, it's harder to learn over time.
I need to check but it might be doable to do some kind of partition
without relaying on a method implemented by neverthrow
.
Overall nice pinpointing of the issue ๐
Another issue with the current implementation, that isn't directly connected to this issue though, is that even when isErr() returns true, the reduce will keep iterating over the entire input list. So not really short circuiting but only returning err(result.error) n times before it finally finishes all elements.
I played around with the implementation and this here is magnitudes faster on my machine.
this implementation doesn't create new arrays in every iteration but mutates one array a bunch of times.
I am not too familiar with this package yet so i am not sure if I am misusing something here though.
export const combineResultListFast = <T, E>(
resultList: readonly Result<T, E>[],
): Result<readonly T[], E> => {
let acc = ok([]) as Result<T[], E>
for (const result of resultList) {
if (result.isErr()) {
acc = err(result.error)
break // short circuiting here
} else {
acc.map((list) => list.push(result.value))
}
}
return acc
}
i benchmarked it with a couple of different input sizes and this is the result of the microbenchmark on my machine:
โโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ (index) โ numElements โ timeOld โ timeNew โ areResultsSame โ improvement โ
โโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโค
โ 0 โ 100 โ 0.08329199999997172 โ 0.09158300000001418 โ true โ '0x' โ
โ 1 โ 1000 โ 2.159749999999974 โ 0.132000000000005 โ true โ '16x' โ
โ 2 โ 10000 โ 53.110708999999986 โ 0.5692910000000211 โ true โ '93x' โ
โ 3 โ 100000 โ 9115.303124999999 โ 2.2292080000006536 โ true โ '4089x' โ
โ 4 โ 500000 โ 310282.093833 โ 11.678125000034925 โ true โ '26569x' โ
โโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ
You can check out the code as well as the benchmark code here:
function implementation
bechmark
Addionally, it might also make sense to refactor the combineResultListWithAllErrors()
function to not use reduce + spread as reduce and spread in combination is always slow for large data sets in comparison with regular loops and array mutation. See this jsperf benchmark: https://jsperf.app/noyulo
export const combineResultListWithAllErrors = <T, E>(
resultList: readonly Result<T, E>[],
): Result<readonly T[], E[]> => {
let acc = ok([]) as Result<T[], E[]>
for (const result of resultList) {
if (result.isErr() && acc.isErr()) {
acc.error.push(result.error)
} else if (result.isErr() && acc.isOk()) {
acc = err([result.error])
} else if (result.isOk() && acc.isErr()) {
// do nothing
} else if (result.isOk() && acc.isOk()) {
acc.value.push(result.value)
}
}
return acc
}