Point of ArrayBuilder
mpilquist opened this issue · 12 comments
I’m not sure what the point of ArrayBuilder is.
I believe the point is building an array for which you don't know the final size. Presumably ArrayBuilder
will give you better performance than first building something like a mutable.Buffer
and then converting to an array (would be nice to confirm this assumption).
Aside: might be worth considering adding benchmarks for the various builders that use sizeHint
.
I believe the point is building an array for which you don't know the final size.
Right, looks like Array
appears faster here because we know the size ahead of time. Seems like ArrayBuilder
boxes its contents? There's an order-of-magnitude slowdown relative to Array
, and boxing is the usual culprit in my experience.
Hm, ArrayBuilder
appears to avoid boxing based on a quick look at the source, but it looks like you have to call the appropriate constructor directly instead of calling make
? Here's what I suspect, though I could be totally misreading the source:
scala> val bldr = new collection.mutable.ArrayBuilder.ofInt()
bldr: collection.mutable.ArrayBuilder.ofInt = ArrayBuilder.ofInt
scala> bldr += 1 // This doesn't box
res6: bldr.type = ArrayBuilder.ofInt
scala> val bldr2 = collection.mutable.ArrayBuilder.make[Int]
bldr2: scala.collection.mutable.ArrayBuilder[Int] = ArrayBuilder.ofInt
scala> bldr2 += 1 // This boxes
res7: bldr2.type = ArrayBuilder.ofInt
This highlights exactly one of my biggest issues with the standard collections: no bias. Users aren't steered toward or away from any particular collection or operator. Why does make
exist if we're punished for reasonable use of it (vs ofInt
)?
Oh I completely agree. It shouldn't be so easy to write really badly performing code. I'd love a collection library that was designed to make efficient use easy / inefficient use hard. I'd happily take that tradeoff over API consistency.
Amen brother.
@mpilquist I switched to ofInt
, but am still seeing the same 5x slowdown compared to an Array
of known size.
Hm I think that might bust the boxed/unboxed claim then. For comparison purposes, can you try using .sizeHint(arraySize)
? This should hopefully avoid any successive copies as the array grows larger during +=
calls.
WIth sizeHint
, where you hint the exact correct size of the final result:
[info] Benchmark Mode Cnt Score Error Units
[info] VectorBench.abuilder1000 avgt 5 0.601 ± 0.018 us/op
[info] VectorBench.abuilder10000 avgt 5 6.090 ± 0.100 us/op
[info] VectorBench.abuilder100000 avgt 5 64.772 ± 16.078 us/op
[info] VectorBench.array1000 avgt 5 0.591 ± 0.008 us/op
[info] VectorBench.array10000 avgt 5 5.397 ± 0.037 us/op
[info] VectorBench.array100000 avgt 5 53.866 ± 0.884 us/op
It probably pre-allocates a load of space in this case eh.
Awesome, so 5x penalty in the 100k case for not knowing the size upfront and no boxing/unboxing occurring. The 5x penalty is higher than I would have expected. Looks like ArrayBuilder
starts at size 16 and then successively doubles, so you end up with 17 array copies for the 100k test.
More data. Here's when you hint 50% smaller than the final size:
[info] Benchmark Mode Cnt Score Error Units
[info] VectorBench.abuilder1000 avgt 5 2.018 ± 0.029 us/op
[info] VectorBench.abuilder10000 avgt 5 23.916 ± 0.193 us/op
[info] VectorBench.abuilder100000 avgt 5 199.556 ± 1.771 us/op
[info] VectorBench.array1000 avgt 5 0.589 ± 0.011 us/op
[info] VectorBench.array10000 avgt 5 5.396 ± 0.035 us/op
[info] VectorBench.array100000 avgt 5 53.998 ± 0.889 us/op
which is only barely faster than not hinting at all.
And when you over-estimate by a factor of 2:
[info] Benchmark Mode Cnt Score Error Units
[info] VectorBench.abuilder1000 avgt 5 1.145 ± 0.032 us/op
[info] VectorBench.abuilder10000 avgt 5 11.406 ± 0.152 us/op
[info] VectorBench.abuilder100000 avgt 5 121.283 ± 6.380 us/op
[info] VectorBench.array1000 avgt 5 0.589 ± 0.005 us/op
[info] VectorBench.array10000 avgt 5 5.389 ± 0.056 us/op
[info] VectorBench.array100000 avgt 5 53.987 ± 0.357 us/op
You still get punished, but only by a factor of 2.
More news: .sizeHint
doesn't help when you're working with Arrays of classes.