nicolasstucki/multisets

immutable.Bag doesn't extend `GenericTraversableTemplate[Bag[T]]`

Opened this issue · 7 comments

This means that the return type of e.g. flatten is just Iterable[T]. This should be hopefully easy to fix.
I haven't checked the other Bag variants yet, though I conjecture they need updating as well.

I'm not sure if the semantic of flatten would be correct for any custom equivalence. I'll have to analyse it depth. Could you tell me which is your specific use case.

I'm using a standard equality (compact), and relying the equivalence b.flatMap(f).flatten == b.map(f). (More in particular, I'm saving the results of b.flatMap(f), to update it incrementally if b undergoes a change). I haven't fully understood semantics of other cases, I'm still exploring your library.

Your comment about semantics seems interesting, but since flatten needn't access implementation details, this seems likely to affect other client code. And indeed, if I consider bags of different sizes equivalent in my bb: Bag[Bag[T]], the size of bb.flatten will be unpredictable with some implementations of flatten.

I currently uglily reimplemented flatten like in the standard library, which is probably slower than needed.

  implicit def config[T] = Bag.configuration.compact[T]
  implicit class FlattenOps[T](b: Bag[Bag[T]]) {
    def flattenB: Bag[T] = {
      val builder = Bag.newBuilder[T]
      for (baglet <- b)
        builder ++= baglet
      builder.result
    }
  }

You are probably right, extending it will work correctly.

I'm not sure I'm suggesting that. However, for standard equalities it'll work fine, and that's the usecase I'd like. For the rest, I'm not sure what correctly means.

Also, when I hacked together quickly a bag implementation, I actually had a potentially faster version, which multiplies multiplicities instead of readding elements:

case class Bag[A](contents: immutable.Map[A, Int] = immutable.HashMap())
//XXX: Not reusing collection infrastructure :-(
{
  //...
  def flatten[B](implicit p: A <:< Bag[B]): Bag[B] = {
    new Bag(for {
      (outerEl, count) <- contents
      innerBag = p(outerEl)
      (el, count2) <- innerBag.contents
    } yield (el, count * count2))
  }
}

Could such an implementation be supported in your library?

That may work for the compact representation but will fail on the others. But the implementation of flatten might be just a union over the bags. Which would internally do something similar to the code you wrote when using the compact representation.

Thanks, I was indeed just supporting a compact representation.

I've tried out your suggestion. However, as currently implemented, that appears to take quadratic time, because union takes time linear in the size of both collections (see new issue #7), and the accumulator keeps growing. (In fact, I've not waited for my benchmark to terminate).

The complexity is thus equivalent to the one of the code below, which has more visibly quadratic cost:

var acc = Array.empty
for (bag <- bags) {
  val newAcc = Array.empty
  newAcc += acc   //cost: O( | acc | ), which keeps increasing
  newAcc += bag   //cost: O( | bag | )
  acc = newAcc
}

I forgot to write the implementation of flattenB I tried out:

    def flattenB: Bag[T] =
      b.fold[Bag[T]](Bag.empty)(_ union _)

Apparently, that fold resolves to foldLeft, though foldRight wouldn't make a difference.