Scala Benchmarks

An independent set of benchmarks for testing common Scala idioms.

Functional Programming
Mutable Data
Pattern Matching
- Deconstructing Containers
- Guard Patterns

Functional Programming

Folding

We often want to collapse a collection into some summary of its elements. This is known as a fold, a reduce, or a catamorphism:

List(1,2,3).foldLeft(0)(_ + _)  // 6

How fast is this operation in the face of the JVM’s while and mutable variables? For instance the familiar, manual, and error-prone:

var n: Int = 0
var i: Int = coll.length - 1

while (i >= 0) {
  n += coll(i)
  i -= 1
}

FoldBench compares List, scalaz.IList, Vector, Array, Stream, and Iterator for their speeds in various fold operations over Ints. FoldClassBench tries these same operations over a simple wrapping class to see how boxing/references affect things.

Int Results:

All times are in microseconds. Entries marked with an asterisk are sped up by optimization flags. Entries marked with two are slowed down by them.

Benchmark	List	IList	Vector	Array	Stream	EStream	Iterator
`foldLeft`	44.1**	31.3	63.5	34.0*	56.9	180.3**	55.4
`foldRight`	69.2	81.9	137.9*	36.3*	Stack Overflow	Stack Overflow	147.6
Tail Recursion	45.9	24.1			69.8
`sum`	76.9		71.0	79.0	74.7
`while`	44.0		38.4	3.0	52.9		45.4

Pair Class Results:

All times are in microseconds.

Benchmark	List	IList	Vector	Array	Stream	Iterator
`foldLeft`	39.5	37.5	70.2	39.9	68.2	65.8
`foldRight`	83.6	98.1	242.1	38.8	Stack Overflow	157.3
Tail Recursion	39.2	37.9			118.6**
`while`	39.3		57.8	36.2	70.1	39.2

Observations:

foldLeft is always better than both foldRight and manual tail recursion for catamorphisms (reduction to a single value).
sum should be avoided.
Iterator benefits from while, but not enough to beat List.
Collections with random access (especially Array) benefit from while loops.
Array has no advantage over List when holding non-primitive types!

Recommendation:

List.foldLeft is concise and performant for both primitive and boxed types. If you were already dealing with an Array[Int] or likewise, then a while loop will be faster.

Chained Higher-Order Functions

It’s common to string together multiple operations over a collection, say:

List(1,2,3,4).map(foo).filter(pred).map(bar)

which is certainly shorter and cleaner in its intent than manually manipulating a mutable collection in a while loop. Are higher-order operations like these still fast? People used to Haskell’s list fusion might point out that these operations typically don’t fuse in Scala, meaning that each chained operation fully iterates over the entire collection and allocates a new copy. Stream and Iterator are supposed to be the exceptions, however.

Stream in particular is what people wanting Haskell’s lazy lists may reach for first, on the claim that the elements memoize, chained operations fuse, and they support infinite streams of values. Let’s see how everything performs.

StreamBench performs the following operations on List, scalaz.IList, Vector, Array, Stream, scalaz.EphemeralStream and Iterator. We test:

Head: map-filter-map-head. Which collections “short-circuit”, only fully processing the head and nothing else?
Max: map-filter-map-max. How quickly can each collection fully process itself? Does fusion occur (esp. with Stream)?
Reverse: reverse-head. Can any of the collections “cheat” and grab the last element quickly?
Sort: map-filter-map-sorted-head. Does Stream still leverage laziness with a “mangling” operation like sort?

Results:

All times are in microseconds.

Benchmark	List	IList	Vector	Array	Stream	EStream	Iterator
Head	182.3	273.2	133.2	206.3	0.065	0.17	0.023
Max	198.9	401.7	263.5	192.7	863.7	1714.4	139.7
Reverse	37.8	49.2	146.7	45.6	371.6	448.5
Sort	327.5	607.6	277.8	289.4	1482.8

Observations:

Stream won’t do work it doesn’t have to, as advertised (re: Head).
Stream is very slow to fully evaluate, implying no operation fusion. Nothing clever happens with sorting.
Iterator overall is the fastest collection to chain higher-order functions.
List has the fastest reverse.

Recommendation:

If you want to chain higher-order operations in Scala, use an Iterator. If you have something like a List instead, create an Iterator first with .iterator before you chain.

Concatenation

Sometimes we need to merge two instances of a container together, end-to-end. This is embodied by the classic operator ++, available for all the major collection types.

We know that the collection types are implemented differently. Are some better than others when it comes to ++? For instance, we might imagine that the singly-linked List type would be quite bad at this. The lazy Stream types should be instantaneous.

ConcatBench tests List, scalaz.IList, Array, Vector, Stream, and scalaz.EphemeralStream for their performance with the ++ operator. Two results are offered for Array: one with Int and one for a simple Pair class, to see if primitive Arrays can somehow be optimized here by the JVM, as they usually are. Otherwise, the results are all for collections of Int.

All times are in microseconds.

Item Count	`List`	`IList`	`Vector`	`Array[Int]`	`Array[Pair]`	`Stream`	`EStream`
1,000	14	10	17	0.6	0.7	0.02	0.02
10,000	117	78	147	7	7	0.02	0.02
100,000	931	993	1209	75	77	0.02	0.02
1,000,000	8506	10101	10958	1777	1314	0.02	0.02

Observations:

The Stream types were instantaneous, as expected.
Array is quick! Somehow quicker for classes, though.
The drastic slowdown for Array at the millions-of-elements scale is strange.
IList beats List until millions-of-elements scale.
Vector has no advantage here, despite rumours to the contrary.

Recommendation:

If your algorithm requires concatenation of large collections, use Array. If you’re worried about passing a mutable collection around your API, consider scalaz.ImmutableArray, a simple wrapper that prevents careless misuse.

Mutable Data

List, IList and Array

Above we saw that List performs strongly against Array when it comes to chaining multiple higher-order functions together. What happens when we just need to make a single transformation pass over our collection - in other words, a .map? Array with a while loop is supposed to be the fastest iterating operation on the JVM. Can List and IList still stand up to it?

MapBench compares these operations over increasing larger collection sizes of both Int and a simple wrapper class.

Results:

All times are in microseconds.

Benchmark	`List.map`	`IList.map`	`Array` + `while`
100 Ints	0.77	1.1	0.05
1000 Ints	7.8	10.9	0.45
10000 Ints	71.6	99.9	3.7
100 Classes	0.83	1.3	0.4
1000 Classes	8.6	12.9	4.3
10000 Classes	81.3	111.2	43.1

Observations:

For List, there isn’t too much difference between Ints and classes.
Array is fast to do a single-pass iteration.

Recommendation:

If your code involves Array, primitives, and simple single-pass transformations, then while loops will be fast for you. Otherwise, your code will be cleaner and comparitively performant if you stick to immutable collections and chained higher-order functions.

Builder Classes

You want to build up a new collection, perhaps iterating over an existing one, perhaps from some live, dynamic process. For whatever reason .map and .foldLeft are not an option. Which collection is best for this? VectorBench tests how fast each of List, scalaz.IList, ListBuffer, Vector, VectorBuilder, Array, ArrayBuilder, and IndexedSeq can create themselves and accumulate values. For List, this is done with tail recursion. For IndexedSeq, this is done via a naive for-comprehension. For all others, this is done with while loops. The Buffer and Builder classes perform a .result call at the end of iterating to take their non-builder forms (i.e. VectorBuilder => Vector). ArrayBuilder is given an overshot size hint (with .sizeHint) in order to realistically minimize inner Array copying.

Results:

All times are in microseconds.

Benchmark	`List`	`IList`	`ListBuffer`	`Vector`	`VectorBuilder`	`Array`	`ArrayBuilder`	`IndexedSeq`
1000 Ints	5.7	5.5	5.5	20.8	6.6	0.6	1.1	5.9
10000 Ints	60.2	57.1	57.9	206.1	39.0	5.3	11.4	61.4
100000 Ints	545.1	529.1	551.6	2091.2	384.3	53.3	121.3	615.3
1000 Classes	6.2	6.2	7.2	21.5	6.3	3.8	4.9	6.4
10000 Classes	64.4	62.4	68.5	214.3	44.7	41.4	53.1	65.4
100000 Classes	592.0	600.3	611.6	2164.7	429.4	357.0	523.5	653.3

Observations:

For primitives, Array is king.
Avoid appending to immutable Vectors.
Avoid repeated use of ListBuffer.prepend! Your runtime will slow by an order of magnitude vs +=:.
For classes, at small scales (~1000 elements) there is mostly no difference between the various approaches.
ArrayBuilder can be useful if you’re able to ballpark what the final result size will be.
VectorBuilder fulfills the promise of Builders, but can only append to the right. You’d have to deal with the fact that your elements are reversed.

Recommendation:

The best choice here depends on what your next step is.

If you plan to perform while -based numeric calculations over primitives only, stick to Array. If using ArrayBuilder with primitives, avoid the .make method. Use something like .ofInt instead. Also make sure that you use .sizeHint to avoid redundant inner Array copying as your collection grows. Failing to do so can introduce a 5x slowdown.

Otherwise, consider whether your algorithm can’t be reexpressed entirely in terms of Iterator. This will always give the best performance for subsequent chained, higher-order functions.

If the algorithm can’t be expressed in terms of Iterator from the get-go, try building your collection with VectorBuilder, call .iterator once filled, then continue.

Mutable Set and Java’s ConcurrentHashMap

You’d like to build up a unique set of values and for some reason calling .toSet on your original collection isn’t enough. Perhaps you don’t have an original collection. Scala’s collections have been criticized for their performance, with one famous complaint saying how their team had to fallback to using Java collection types entirely because the Scala ones couldn’t compare (that was for Scala 2.8, mind you).

Is this true? UniquesBench compares both of Scala’s mutable and immutable Set types with Java’s ConcurrentHashMap to see which can accumulate unique values faster.

Results:

All values are in microseconds.

Benchmark	`mutable.Set`	`immutable.Set`	Java `ConcurrentHashMap`
100 values	4.6	7.7	6.1
1000 values	62.2	107.4	71.3
10000 values	811.1*	1290.4	777.1

Note: About half the time the 10000-value benchmark for mutable.Set optimizes down to ~600us instead of the ~800us shown in the chart.

Observations:

mutable.Set is fastest at least for small amounts of data, and might be fastest at scale.
immutable.Set is slower and has worse growth, as expected.

Recommendation:

First consider whether your algorithm can’t be rewritten in terms of the usual FP idioms, followed by a .toSet call to make the collection unique.

If that isn’t possible, then trust in the performance of native Scala collections and use mutable.Set.

Pattern Matching

Deconstructing Containers

It’s common to decontruct containers like this in recursive algorithms:

def safeHead[A](s: Seq[A]): Option[A] = s match {
  case Seq() => None
  case h +: _ => Some(h)
}

But List and Stream have special “cons” operators, namely :: and #:: respectively. The List version of the above looks like:

def safeHead[A](l: List[A]): Option[A] = l match {
  case Nil => None
  case h :: _ => Some(h)
}

How do these operators compare? Also, is it any slower to do it this way than a more Java-like:

def safeHead[A](l: List[A]): Option[A] =
  if (l.isEmpty) None else l.head

The MatchContainersBench benchmarks use a tail-recursive algorithm to find the last element of each of List, scalaz.IList, Vector, Array, Seq, and Stream.

Results:

All times are in microseconds.

Benchmark	List	IList	Vector	Seq	Array	Stream
`::` Matching	42.8	23.6				168.4
`+:` Matching	79.0		1647.5	707.4		170.2
`if` Statements	39.9		816.9	39.4	16020.6	55.8

Observations:

Canonical List and IList matching is fast.
Seq matching with +:, its canonical operator, is ironically slow.
Pattern matching with +: should be avoided in general.
if is generally faster than pattern matching, but the code isn’t as nice.
Avoid recursion with Vector and Array!
Array.tail is pure evil. Each call incurs ArrayOps wrapping and seems to reallocate the entire Array. Vector.tail incurs a similar slowdown, but not as drasticly.

Recommendation:

Recursion involving containers should be done with List and pattern matching for the best balance of speed and simplicity. If you can take scalaz as a dependency, its IList will be even faster.

Guard Patterns

It can sometimes be cleaner to check multiple Boolean conditions using a match:

def foo(i: Int): Whatever = i match {
  case _ if bar(i) => ???
  case _ if baz(i) => ???
  case _ if zoo(i) => ???
  case _ => someDefault
}

where we don’t really care about the pattern match, just the guard. This is in constrast to if branches:

def foo(i: Int): Whatever = {
  if (bar(i)) ???
  else if (baz(i)) ???
  else if (zoo(i)) ???
  else someDefault
}

which of course would often be made more verbose by many {} pairs. Are we punished for the empty pattern matches? MatchBench tests this, with various numbers of branches.

Results:

All times are in nanoseconds.

Benchmark	Guards	Ifs
1 Condition	3.3	3.3
2 Conditions	3.6	3.6
3 Conditions	3.9	3.9

Identical! Feel free to use whichever you think is cleaner.

teazaid/scala-benchmarks

Scala Benchmarks

Table of Contents

Functional Programming

Folding

Chained Higher-Order Functions

Concatenation

Mutable Data

List, IList and Array

Builder Classes

Mutable Set and Java’s ConcurrentHashMap

Pattern Matching

Deconstructing Containers

Guard Patterns