altoo-ag/akka-kryo-serialization

Extract non-Akka code into a library?

ghostdogpr opened this issue ยท 17 comments

Chill being unmaintained and stuck with Kryo 4, I am looking into alternatives and ran into this library.
However I am not using Akka and am only interested in the "Scala" wrapper around Kryo.
What do you think about extracting the common part which is not depending on Akka into a separate library?
Another benefit of this would be to reduce the duplication between the Akka and the Pekko repositories as they could both depend on the common library.

Hi @ghostdogpr
We already thought about that - especially when forking off pekko. So far we did not have enough incentive to extract the common code into a scala-kryo-serialization. But we do think it would be a good thing...
Because this change would probably be breaking we'd want to do that before we release a pekko-kryo-serialization 1.0.0.
Would you be willing to help with testing?

Sure, I could use a snapshot or RC version and run it on our test suite at work.

I'm also maintaining Shardcake that depends on Chill and would be happy to switch over to this library if possible.

I'm currently working on extracting the Scala serialization part into separately usable library.

The Akka serializer has two modes of operation. Either serialization puts the class the serialized data for later deserialization without a given class, or serialization omits the class and later deserialization requires the class to be given.
This is due to Akka already having that information already on the message.

This makes the API somewhat hard to understand (option input that is tied to a config flag) but hidden in Akka.

@ghostdogpr How would you expect the API? Would you prefer a simpler API that always puts the class into the serialized data?

@nvollmar we need something like this:

def encode(message: Any): Array[Byte]
def decode[A](bytes: Array[Byte]): A

Actual code with Chill (Task is from ZIO, you can ignore it):

def encode(message: Any): Task[Array[Byte]] = ZIO.attempt(kryoPool.toBytesWithClass(message))
def decode[A](bytes: Array[Byte]): Task[A]  = ZIO.attempt(kryoPool.fromBytes(bytes).asInstanceOf[A])

Thank for the feedback. I meanwhile started drafting the api as:

  def serialize(obj: AnyRef): Try[Array[Byte]]
  def deserialize[T: ClassTag](bytes: Array[Byte]): Try[T]

It leans a bit towards how the Akka API is defined. Would that be suitable for you?

Is the ClassTag necessary? We didn't have it in our API.

Looking into your code, I see you use readObject which requires a class, a version with writeClassAndObject/readClassAndObject would be more what I need (but I can imagine both of them to be useful).

The Akka serialization uses both readObject and readClassAndObject depending on the config.
For the Scala serialization I removed that and always use readClassAndObject.

I added the classTag because the underlying API that will serve both cases expects a manifest. I should be able to eliminate that and reduce the API to:

def serialize(obj: AnyRef): Try[Array[Byte]]
def deserialize[T](bytes: Array[Byte]): Try[T]

Looks good!

@ghostdogpr A non-Akka library is in works https://github.com/altoo-ag/scala-kryo-serialization, we will prepare a first release soon and use it as base for pekko-kryo-serialization later

I just tried the first RC and it works great in the general case! ๐ŸŽ‰

I just have a couple remarks/questions:

  • It would be more convenient to accept Any instead of AnyRef. The underlying API takes a Java Object which you can pass Any (AnyVal are automatically converted).
  • Is there a way to customize the underlying Kryo instance? We are customizing the ReferenceResolver by calling setReferenceResolver. Maybe extending DefaultKryoInitializer? A way to pass that initializer by code rather than by config would be nice.

EDIT: I was able to use my custom ReferenceResolver, seems all good ๐Ÿ‘

  • I've chosen AnyRef as it is the Java Object in Scala. But we can have a look at that.
  • DefaultKryoInitializer being configured by config comes due to Akka/Pekko creating the serializer by config. For this project we could also remove the config option and pass it in the constructor instead.
  • I've chosen AnyRef as it is the Java Object in Scala. But we can have a look at that.

I checked Chill's code and basically it accepts Any and passes it to the Java function without casting. If it's AnyVal I think Scala transforms it on its own.

  • DefaultKryoInitializer being configured by config comes due to Akka/Pekko creating the serializer by config. For this project we could also remove the config option and pass it in the constructor instead.

It would be nice for people not using Akka/Pekko but is definitely not a blocker.

DefaultKryoInitializer being configured by config comes due to Akka/Pekko creating the serializer by config. For this project we could also remove the config option and pass it in the constructor instead.

It would be nice for people not using Akka/Pekko but is definitely not a blocker.

@ghostdogpr
Why do you overwrite the ReferenceResolver and is this a common usecase?

This is the current behavior:

  • We do want to provide a convenient default that should work out of the box -> kryo-reference-map = true -> MapReferenceResolver`. I think the default is fine.
  • And for special tuned cases where one is aware of the consequence kryo-reference-map = false -> ListReferenceResolver
  • If one needs further customization it should be doable when providing a subclass of DefaultKryoInitializer -> works :) But this is more work and should not be necessary for common cases...

If you only overwrite maximumCapacity of MapReferenceResolver then we are better off providing this value in the config.

@ghostdogpr Why do you overwrite the ReferenceResolver and is this a common usecase?

We found out that performance was bad for large objects because of the reference tracking, so we tried to disable it but ran into stack overflow for some recursive types (Exception for example, but also nested case classes in Scala that have a reference to their parent). So we ended up making our own that overrides useReferences to exclude a bunch of "known" types. That way we get the speed and no surprises.

I was able to do it with your library like this:

class CustomKryoInitializer extends DefaultKryoInitializer {
  override def preInit(kryo: ScalaKryo): Unit = {
    kryo.setReferenceResolver(myCustomResolver)
    super.preInit(kryo)
  }

And then referencing this in the config with the kryo-initializer key.

Thanks for testing and the feedback. I've made PR to switch the Scala API to any.

I'll close this issue, if you have something else please open a new issue in the scala-kryo-serialization project :)

Thanks a lot! ๐Ÿ™

@ghostdogpr
with altoo-ag/scala-kryo-serialization#3 you can use

class CustomKryoInitializer extends DefaultKryoInitializer {
  override def def createReferenceResolver(settings: KryoSerializationSettings): ReferenceResolver = myCustomResolver
}