nicolasstucki/multisets

immutable Bags are not really immutable.....

Closed this issue · 6 comments

Hi,

i am using your library, i found out that you immutable bags are actually mutable, i did the following

val bag = scala.collection.immutable.Bag("a")
println(bag) // output is Bag(a)
println(bag + "b") // output is Bag(a, b)
println(bag) // output is Bag(a, b).... it should be Bag(a)....

It would be great if you fix the problem, i really need to use a multiset library

Thanks,
Tiago

It seems to me that the problem is in the HashBag implementation - essentially, the immutable hash bag is a plain old hash map under the hood, instead of a hash trie, which is a typical way to implement an immutable hashed collection:

https://github.com/nicolasstucki/multisets/blob/master/src/main/scala/collection/immutable/HashBag.scala#L11

@nicolasstucki , thoughts?

That is correct, the HashBag was is using a mutable hash table under the hood. If i remember correctly it was for performance of the builder, but it is true that a hash trie would be a better way to maintain the bag after that. I will try to take a look at it this week and try to fix it.

Apparently the issue is still there with Scala 2.11 and multisets 0.2. I need a fast multiset library, so maybe I'll take a look this week.
One way to fix this bug would be to copy the content of the mutable builder into an immutable trie. Is that your plan?
ListBuilders are simply turned into Lists via some (debated) magic to avoid this final copy, but hopefully that's overkill.

scala> import scala.collection.immutable.Bag
import scala.collection.immutable.Bag

scala> implicit val config = Bag.configuration.compact[String]
config: scala.collection.immutable.HashedBagConfiguration[String] = scala.collection.immutable.HashedBagConfiguration$HashedMultiplicityBagConfiguration@17b50093

scala> val bag = scala.collection.immutable.Bag("a")
bag: scala.collection.immutable.Bag[String] = Bag(a)

scala> println(bag)
Bag(a)

scala> println(bag + "b")
Bag(b; a)

scala> println(bag)
Bag(b; a)

The reason why I used mutable a mutable map inside of it was in part for performance and in part because I wasn't able to make the custom hashing work on the immutable hash tries.

The good news is that yesterday I found the reason behind that problem was the difference in definition of the .hashCode and .## on primitive values. I've already implemented a new version taking this into account an so far it seams to be working. I might be able to publish the fix for the end of the week.

Cool!

Released version 0.3 with a new implementation that fixes this issue.