Scala Benchmarks
An independent set of benchmarks for testing common Scala idioms.
Table of Contents
Functional Programming
Folding
We often want to collapse a collection into some summary of its elements. This is known as a fold, a reduce, or a catamorphism:
List(1,2,3).foldLeft(0)(_ + _) // 6
How fast is this operation in the face of the JVM’s while
and mutable
variables? For instance the familiar, manual, and error-prone:
var n: Int = 0
var i: Int = coll.length - 1
while (i >= 0) {
n += coll(i)
i -= 1
}
FoldBench
compares List
, scalaz.IList
, Vector
, Array
, Stream
, and
Iterator
for their speeds in various fold operations over Ints.
FoldClassBench
tries these same operations over a simple wrapping class to see
how boxing/references affect things.
Int
Results:
All times are in microseconds. Entries marked with an asterisk are sped up by optimization flags. Entries marked with two are slowed down by them.
Benchmark | List | IList | Vector | Array | Stream | EStream | Iterator |
---|---|---|---|---|---|---|---|
foldLeft | 44.1** | 31.3 | 63.5 | 34.0* | 56.9 | 180.3** | 55.4 |
foldRight | 69.2 | 81.9 | 137.9* | 36.3* | Stack Overflow | Stack Overflow | 147.6 |
Tail Recursion | 45.9 | 24.1 | 69.8 | ||||
sum | 76.9 | 71.0 | 79.0 | 74.7 | |||
while | 44.0 | 38.4 | 3.0 | 52.9 | 45.4 |
Pair
Class Results:
All times are in microseconds.
Benchmark | List | IList | Vector | Array | Stream | Iterator |
---|---|---|---|---|---|---|
foldLeft | 39.5 | 37.5 | 70.2 | 39.9 | 68.2 | 65.8 |
foldRight | 83.6 | 98.1 | 242.1 | 38.8 | Stack Overflow | 157.3 |
Tail Recursion | 39.2 | 37.9 | 118.6** | |||
while | 39.3 | 57.8 | 36.2 | 70.1 | 39.2 |
Observations:
foldLeft
is always better than bothfoldRight
and manual tail recursion for catamorphisms (reduction to a single value).sum
should be avoided.Iterator
benefits fromwhile
, but not enough to beatList
.- Collections with random access (especially
Array
) benefit fromwhile
loops. - Array has no advantage over List when holding non-primitive types!
Recommendation:
List.foldLeft
is concise and performant for both primitive and boxed types. If you were already dealing with anArray[Int]
or likewise, then awhile
loop will be faster.
Chained Higher-Order Functions
It’s common to string together multiple operations over a collection, say:
List(1,2,3,4).map(foo).filter(pred).map(bar)
which is certainly shorter and cleaner in its intent than manually manipulating
a mutable collection in a while
loop. Are higher-order operations like these
still fast? People used to Haskell’s list fusion might point out that these
operations typically don’t fuse in Scala, meaning that each chained operation
fully iterates over the entire collection and allocates a new copy. Stream
and
Iterator
are supposed to be the exceptions, however.
Stream
in particular is what people wanting Haskell’s lazy lists may reach for
first, on the claim that the elements memoize, chained operations fuse, and they
support infinite streams of values. Let’s see how everything performs.
StreamBench
performs the following operations on List
, scalaz.IList
,
Vector
, Array
, Stream
, scalaz.EphemeralStream
and Iterator
. We test:
- Head: map-filter-map-head. Which collections “short-circuit”, only fully processing the head and nothing else?
- Max: map-filter-map-max. How quickly can each collection fully process itself?
Does fusion occur (esp. with
Stream
)? - Reverse: reverse-head. Can any of the collections “cheat” and grab the last element quickly?
- Sort: map-filter-map-sorted-head. Does
Stream
still leverage laziness with a “mangling” operation like sort?
Results:
All times are in microseconds.
Benchmark | List | IList | Vector | Array | Stream | EStream | Iterator |
---|---|---|---|---|---|---|---|
Head | 182.3 | 273.2 | 133.2 | 206.3 | 0.065 | 0.17 | 0.023 |
Max | 198.9 | 401.7 | 263.5 | 192.7 | 863.7 | 1714.4 | 139.7 |
Reverse | 37.8 | 49.2 | 146.7 | 45.6 | 371.6 | 448.5 | |
Sort | 327.5 | 607.6 | 277.8 | 289.4 | 1482.8 |
Observations:
Stream
won’t do work it doesn’t have to, as advertised (re: Head).Stream
is very slow to fully evaluate, implying no operation fusion. Nothing clever happens with sorting.Iterator
overall is the fastest collection to chain higher-order functions.List
has the fastestreverse
.
Recommendation:
If you want to chain higher-order operations in Scala, use an
Iterator
. If you have something like aList
instead, create anIterator
first with.iterator
before you chain.
Concatenation
Sometimes we need to merge two instances of a container together, end-to-end.
This is embodied by the classic operator ++
, available for all the major
collection types.
We know that the collection types are implemented differently. Are some better
than others when it comes to ++
? For instance, we might imagine that the
singly-linked List
type would be quite bad at this. The lazy Stream
types
should be instantaneous.
ConcatBench
tests List
, scalaz.IList
, Array
, Vector
, Stream
, and
scalaz.EphemeralStream
for their performance with the ++
operator. Two
results are offered for Array
: one with Int
and one for a simple Pair
class, to see if primitive Arrays can somehow be optimized here by the JVM, as
they usually are. Otherwise, the results are all for collections of Int
.
All times are in microseconds.
Item Count | List | IList | Vector | Array[Int] | Array[Pair] | Stream | EStream |
---|---|---|---|---|---|---|---|
1,000 | 14 | 10 | 17 | 0.6 | 0.7 | 0.02 | 0.02 |
10,000 | 117 | 78 | 147 | 7 | 7 | 0.02 | 0.02 |
100,000 | 931 | 993 | 1209 | 75 | 77 | 0.02 | 0.02 |
1,000,000 | 8506 | 10101 | 10958 | 1777 | 1314 | 0.02 | 0.02 |
Observations:
- The
Stream
types were instantaneous, as expected. Array
is quick! Somehow quicker for classes, though.- The drastic slowdown for
Array
at the millions-of-elements scale is strange. IList
beatsList
until millions-of-elements scale.Vector
has no advantage here, despite rumours to the contrary.
Recommendation:
If your algorithm requires concatenation of large collections, use
Array
. If you’re worried about passing a mutable collection around your API, considerscalaz.ImmutableArray
, a simple wrapper that prevents careless misuse.
Mutable Data
List, IList and Array
Above we saw that List
performs strongly against Array
when it comes to
chaining multiple higher-order functions together. What happens when we just
need to make a single transformation pass over our collection - in other words,
a .map
? Array
with a while
loop is supposed to be the fastest iterating
operation on the JVM. Can List
and IList
still stand up to it?
MapBench
compares these operations over increasing larger collection sizes of
both Int
and a simple wrapper class.
Results:
All times are in microseconds.
Benchmark | List.map | IList.map | Array + while |
---|---|---|---|
100 Ints | 0.77 | 1.1 | 0.05 |
1000 Ints | 7.8 | 10.9 | 0.45 |
10000 Ints | 71.6 | 99.9 | 3.7 |
100 Classes | 0.83 | 1.3 | 0.4 |
1000 Classes | 8.6 | 12.9 | 4.3 |
10000 Classes | 81.3 | 111.2 | 43.1 |
Observations:
- For
List
, there isn’t too much difference between Ints and classes. Array
is fast to do a single-pass iteration.
Recommendation:
If your code involves
Array
, primitives, and simple single-pass transformations, thenwhile
loops will be fast for you. Otherwise, your code will be cleaner and comparitively performant if you stick to immutable collections and chained higher-order functions.
Builder Classes
You want to build up a new collection, perhaps iterating over an existing one,
perhaps from some live, dynamic process. For whatever reason .map
and
.foldLeft
are not an option. Which collection is best for this? VectorBench
tests how fast each of List
, scalaz.IList
, ListBuffer
, Vector
,
VectorBuilder
, Array
, ArrayBuilder
, and IndexedSeq
can create themselves
and accumulate values. For List
, this is done with tail recursion. For
IndexedSeq
, this is done via a naive for-comprehension. For all others, this
is done with while
loops. The Buffer
and Builder
classes perform a
.result
call at the end of iterating to take their non-builder forms (i.e.
VectorBuilder => Vector
). ArrayBuilder
is given an overshot size hint (with
.sizeHint
) in order to realistically minimize inner Array
copying.
Results:
All times are in microseconds.
Benchmark | List | IList | ListBuffer | Vector | VectorBuilder | Array | ArrayBuilder | IndexedSeq |
---|---|---|---|---|---|---|---|---|
1000 Ints | 5.7 | 5.5 | 5.5 | 20.8 | 6.6 | 0.6 | 1.1 | 5.9 |
10000 Ints | 60.2 | 57.1 | 57.9 | 206.1 | 39.0 | 5.3 | 11.4 | 61.4 |
100000 Ints | 545.1 | 529.1 | 551.6 | 2091.2 | 384.3 | 53.3 | 121.3 | 615.3 |
1000 Classes | 6.2 | 6.2 | 7.2 | 21.5 | 6.3 | 3.8 | 4.9 | 6.4 |
10000 Classes | 64.4 | 62.4 | 68.5 | 214.3 | 44.7 | 41.4 | 53.1 | 65.4 |
100000 Classes | 592.0 | 600.3 | 611.6 | 2164.7 | 429.4 | 357.0 | 523.5 | 653.3 |
Observations:
- For primitives,
Array
is king. - Avoid appending to immutable Vectors.
- Avoid repeated use of ListBuffer.prepend! Your runtime will slow by an order of magnitude vs
+=:
. - For classes, at small scales (~1000 elements) there is mostly no difference between the various approaches.
ArrayBuilder
can be useful if you’re able to ballpark what the final result size will be.VectorBuilder
fulfills the promise of Builders, but can only append to the right. You’d have to deal with the fact that your elements are reversed.
Recommendation:
The best choice here depends on what your next step is.
If you plan to perform
while
-based numeric calculations over primitives only, stick toArray
. If usingArrayBuilder
with primitives, avoid the.make
method. Use something like.ofInt
instead. Also make sure that you use.sizeHint
to avoid redundant innerArray
copying as your collection grows. Failing to do so can introduce a 5x slowdown.Otherwise, consider whether your algorithm can’t be reexpressed entirely in terms of
Iterator
. This will always give the best performance for subsequent chained, higher-order functions.If the algorithm can’t be expressed in terms of
Iterator
from the get-go, try building your collection withVectorBuilder
, call.iterator
once filled, then continue.
Mutable Set and Java’s ConcurrentHashMap
You’d like to build up a unique set of values and for some reason calling
.toSet
on your original collection isn’t enough. Perhaps you don’t have an
original collection. Scala’s collections have been criticized for their
performance, with one famous complaint saying how their team had to fallback to
using Java collection types entirely because the Scala ones couldn’t compare
(that was for Scala 2.8, mind you).
Is this true? UniquesBench
compares both of Scala’s mutable and immutable
Set
types with Java’s ConcurrentHashMap
to see which can accumulate unique
values faster.
Results:
All values are in microseconds.
Benchmark | mutable.Set | immutable.Set | Java ConcurrentHashMap |
---|---|---|---|
100 values | 4.6 | 7.7 | 6.1 |
1000 values | 62.2 | 107.4 | 71.3 |
10000 values | 811.1* | 1290.4 | 777.1 |
Note: About half the time the 10000-value benchmark for mutable.Set
optimizes down to ~600us instead of the ~800us shown in the chart.
Observations:
mutable.Set
is fastest at least for small amounts of data, and might be fastest at scale.immutable.Set
is slower and has worse growth, as expected.
Recommendation:
First consider whether your algorithm can’t be rewritten in terms of the usual FP idioms, followed by a
.toSet
call to make the collection unique.If that isn’t possible, then trust in the performance of native Scala collections and use
mutable.Set
.
Pattern Matching
Deconstructing Containers
It’s common to decontruct containers like this in recursive algorithms:
def safeHead[A](s: Seq[A]): Option[A] = s match {
case Seq() => None
case h +: _ => Some(h)
}
But List
and Stream
have special “cons” operators, namely ::
and #::
respectively. The List
version of the above looks like:
def safeHead[A](l: List[A]): Option[A] = l match {
case Nil => None
case h :: _ => Some(h)
}
How do these operators compare? Also, is it any slower to do it this way than a more Java-like:
def safeHead[A](l: List[A]): Option[A] =
if (l.isEmpty) None else l.head
The MatchContainersBench
benchmarks use a tail-recursive algorithm to find the
last element of each of List
, scalaz.IList
, Vector
, Array
, Seq
, and
Stream
.
Results:
All times are in microseconds.
Benchmark | List | IList | Vector | Seq | Array | Stream |
---|---|---|---|---|---|---|
:: Matching | 42.8 | 23.6 | 168.4 | |||
+: Matching | 79.0 | 1647.5 | 707.4 | 170.2 | ||
if Statements | 39.9 | 816.9 | 39.4 | 16020.6 | 55.8 |
Observations:
- Canonical
List
andIList
matching is fast. Seq
matching with+:
, its canonical operator, is ironically slow.- Pattern matching with
+:
should be avoided in general. if
is generally faster than pattern matching, but the code isn’t as nice.- Avoid recursion with
Vector
andArray
! Array.tail
is pure evil. Each call incursArrayOps
wrapping and seems to reallocate the entireArray
.Vector.tail
incurs a similar slowdown, but not as drasticly.
Recommendation:
Recursion involving containers should be done with
List
and pattern matching for the best balance of speed and simplicity. If you can takescalaz
as a dependency, itsIList
will be even faster.
Guard Patterns
It can sometimes be cleaner to check multiple Boolean
conditions using a match
:
def foo(i: Int): Whatever = i match {
case _ if bar(i) => ???
case _ if baz(i) => ???
case _ if zoo(i) => ???
case _ => someDefault
}
where we don’t really care about the pattern match, just the guard. This is in
constrast to if
branches:
def foo(i: Int): Whatever = {
if (bar(i)) ???
else if (baz(i)) ???
else if (zoo(i)) ???
else someDefault
}
which of course would often be made more verbose by many {}
pairs. Are we
punished for the empty pattern matches? MatchBench
tests this, with various
numbers of branches.
Results:
All times are in nanoseconds.
Benchmark | Guards | Ifs |
---|---|---|
1 Condition | 3.3 | 3.3 |
2 Conditions | 3.6 | 3.6 |
3 Conditions | 3.9 | 3.9 |
Identical! Feel free to use whichever you think is cleaner.