nicolasstucki/multisets

immutable.Bag.++ takes time linear in the size of both collections.

Opened this issue · 0 comments

While reading your code, I noticed one thing which might explain bad performance I'm seeing in some scenarios.

immutable.Bag.++ creates a completely new bag, which is very expensive. Instead, I expect it should reuse the ++ method of HashTries, which will share unchanged nodes of the trie. The implementation is in scala.collection.BagLike.++, and not overriden in scala.collection.immutable.BagLike.++. Indeed, the implementation of immutable.Bag.++ appeared to give correct results even in presence of issue #3.

In particular, this means that the cost of ++ is linear in the size of both arguments: adding 1 element to a bag with N elements takes Θ(N) time. Thus, building a bag of N elements by appending one element at a time takes time Θ(1 + 2 + 3 + ... + N) = Θ(N^2).