/pekko-serialization-helper

Serialization toolbox for Pekko messages, events and persistent state that helps achieve compile-time guarantee on serializability. No more errors in the runtime!

Primary LanguageScalaMIT LicenseMIT

Pekko Serialization Helper

Maven Central GitHub Actions License: MIT

logo-psh-horizontal logo-psh-horizontal

Serialization toolbox for Pekko messages, events and persistent state that helps achieve compile-time guarantee on serializability. There are some Pekko serialization caveats this tool can help with:

  1. Missing serialization binding
  2. Incompatibility of persistent data
  3. Jackson Pekko Serializer
  4. Missing Codec registration

Install

Add the following line to plugins.sbt (take Version from the above Maven badge or GitHub Releases):

addSbtPlugin("org.virtuslab.psh" % "sbt-pekko-serialization-helper" % Version)

and enable the sbt plugin in the target project:

lazy val app = (project in file("app"))
  .enablePlugins(PekkoSerializationHelperPlugin)

Missing serialization binding

To serialize message, persistent state or event in Pekko, Scala trait needs to be defined:

package org
trait MySer

Also, a serializer needs to be bound to this trait in a configuration file:

pekko.actor {
  serializers {
    jackson-json = "org.apache.pekko.serialization.jackson.JacksonJsonSerializer"
  }
  serialization-bindings {
    "org.MySer" = jackson-json
  }
}

The problem occurs if a class is not extended with the base trait bound to the serializer:

trait MySer
case class MyMessage() // extends MySer

pekko-serialization-helper to the rescue! It detects messages, events and persistent states, and checks whether they extend the given base trait and report an error when they don't. This ensures that the specified serializer is used by Pekko and protects against an unintended fallback to Java serialization or outright serialization failure.

To use, base trait should be annotated with @org.virtuslab.psh.SerializabilityTrait:

@SerializabilityTrait
trait MySerializable

It allows catching errors like these:

import org.apache.pekko.actor.typed.Behavior

object BehaviorTest {
  sealed trait Command //extends MySerializable
  def method(msg: Command): Behavior[Command] = ???
}

And results in a compile error, preventing non-runtime-safe code from being executed:

test0.scala:7: error: org.random.project.BehaviorTest.Command is used as Pekko message
but does not extend a trait annotated with org.virtuslab.psh.annotation.SerializabilityTrait.
Passing an object of a class that does NOT extend a trait annotated with SerializabilityTrait as a message may cause Pekko to
fall back to Java serialization during runtime.


  def method(msg: Command): Behavior[Command] = ???
                            ^
test0.scala:6: error: Make sure this type is itself annotated, or extends a type annotated
with  @org.virtuslab.psh.annotation.SerializabilityTrait.
  sealed trait Command extends MySerializable
               ^

The compiler plugin only checks the classes in the sbt modules where PekkoSerializationHelperPlugin is explicitly enabled. It may happen that the base trait (like MySerializable in the example) lives in an sbt module like core where the plugin should not be enabled (e.g. for compilation performance reasons). However, MySerializable needs to be annotated with org.virtuslab.psh.SerializabilityTrait. In order to have access to the SerializabilityTrait annotation without enabling the entire suite of compiler plugins, add PekkoSerializationHelperPlugin.annotation to libraryDependencies:

import org.virtuslab.psh.PekkoSerializationHelperPlugin

lazy val core = (project in file("core"))
  .settings(libraryDependencies += PekkoSerializationHelperPlugin.annotation)

Incompatibility of persistent data

Typical tragic story

A typical problem with a persistence is when the already persisted data is not compatible with the schemas defined in a new version of the application.

To solve this, a mix of a compiler plugin and an sbt task can be used for dumping schema of pekko-persistence to a file. It can be used for detecting accidental changes of events (journal) and states (snapshots) with a simple diff.

To dump persistence schema for each sbt module where PekkoSerializationHelperPlugin is enabled, run:

sbt ashDumpPersistenceSchema

Default file is target/<sbt-module-name>-dump-persistence-schema-<version>.yaml (target/ of top-level module!) but it can be changed using sbt keys:

ashDumpPersistenceSchemaOutputFilename := "file.yaml" // Changes filename
ashDumpPersistenceSchemaOutputDirectoryPath := "~" // Changes directory

Example dump

- name: org.random.project.Data
  typeSymbol: trait
- name: org.random.project.Data.ClassTest
  typeSymbol: class
  fields:
  - name: a
    typeName: java.lang.String
  - name: b
    typeName: scala.Int
  - name: c
    typeName: scala.Double
  parents:
  - org.random.project.Data
- name: org.random.project.Data.ClassWithAdditionData
  typeSymbol: class
  fields:
  - name: ad
    typeName: org.random.project.Data.AdditionalData
  parents:
  - org.random.project.Data

A diff command can be used to check the difference between the version of a schema from develop/main branch and the version from the current commit.

Easy to diff

Jackson Pekko Serializer

Using Jackson Serializer for pekko-persistence is also one of the pitfalls and this plugin provides an alternative by using a serializer that uses Circe.

Dangerous code for Jackson:

case class Message(animal: Animal) extends MySer

sealed trait Animal

final case class Lion(name: String) extends Animal
final case class Tiger(name: String) extends Animal

To make this code work, a lot of Jackson annotations should be added:

case class Message(animal: Animal) extends MultiDocPrintService

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
@JsonSubTypes(
  Array(
    new JsonSubTypes.Type(value = classOf[Lion], name = "lion"),
    new JsonSubTypes.Type(value = classOf[Tiger], name = "tiger")))
sealed trait Animal

final case class Lion(name: String) extends Animal
final case class Tiger(name: String) extends Animal

Also if an object is defined:

case object Tick

There will not be exceptions during serialization but Jackson will create another instance of Tick instead of restoring the object Tick's underlying singleton.

actorRef ! Tick

// Inside the actor:
def receive = {
  case Tick => // this won't get matched
} // message will be unhandled

A Circe-based Pekko serializer can be used. It uses Circe codecs, derived using Shapeless, that are generated during compile time (so serializer won't crash during runtime as reflection-based serializers may do). For a comparison of Circe with other serializers, read Appendix A.

Note that it is not obligatory to use this serializer for the other features (serializability checker, persistence schema dump) to work. They work as well with e.g. when Jackson serializer is selected.

Usage

Add the following to project dependencies:

import org.virtuslab.psh.PekkoSerializationHelperPlugin

lazy val app = (project in file("app"))
  // ...
  .settings(libraryDependencies += PekkoSerializationHelperPlugin.circePekkoSerializer)

Note that enabling PekkoSerializationHelperPlugin does not add Circe serializer to the classpath automatically, as this sbt plugin can be used to improve safety of other serializers as well.

Create a custom serializer by extending CircePekkoSerializer base class:

import org.virtuslab.psh.circe.CircePekkoSerializer

class ExampleSerializer(actorSystem: ExtendedActorSystem)
    extends CircePekkoSerializer[MySerializable](actorSystem) {

  override def identifier: Int = 41

  override lazy val codecs = Seq(Register[CommandOne], Register[CommandTwo])

  override lazy val manifestMigrations = Nil

  override lazy val packagePrefix = "org.project"
}

CircePekkoSerializer can be configured to use Gzip compression when serializing payloads greater than defined size (default is without compression).

See default reference.conf file with comments for more details about CircePekkoSerializer configuration.

For more guidelines on how to use the serializer, read Pekko documentation about serialization, CircePekkoSerializer Scaladoc and look at the examples.

Missing Codec registration

If a codec is not registered, a runtime exception will occur.

import org.virtuslab.psh.circe.CircePekkoSerializer
import org.virtuslab.psh.circe.Register

class ExampleSerializer(actorSystem: ExtendedActorSystem)
  extends CircePekkoSerializer[MySerializable](actorSystem) {
  // ...
  override lazy val codecs = Seq(Register[CommandOne]) // WHOOPS someone forgot to register CommandTwo...
}
java.lang.RuntimeException: Serialization of [CommandTwo] failed. Call Register[A]
for this class or its supertype and append result to `def codecs`.

To solve that, an annotation @org.virtuslab.psh.Serializer can be used.

During compilation, the plugin gathers all direct descendants of the class marked with @org.virtuslab.psh.SerializabilityTrait and checks the body of classes annotated with @org.virtuslab.psh.Serializer if they reference all these direct descendants in any way.

In practice, this is used for checking a class extending CircePekkoSerializer, like this:

import org.virtuslab.psh.circe.CircePekkoSerializer
import org.virtuslab.psh.circe.Register

@Serializer(
  classOf[MySerializable],
  typeRegexPattern = Register.REGISTRATION_REGEX)
class ExampleSerializer(actorSystem: ExtendedActorSystem)
  extends CircePekkoSerializer[MySerializable](actorSystem) {
    // ...
    override lazy val codecs = Seq(Register[CommandOne]) // WHOOPS someone forgot to register CommandTwo...
    // ... but Codec Registration Checker will throw a compilation error here:
    // `No codec for `CommandOne` is registered in a class annotated with @org.virtuslab.psh.annotation.Serializer`
}

Note that as with Serializability Checker and Dump Persistence Schema, this compiler plugin only runs in the sbt modules where PekkoSerializationHelperPlugin is explicitly enabled.

For more information, read @Serializer scaladoc.

Additional configuration for compiler plugins

All compiler plugins and their verbose modes can be enabled/disabled using two sbt keys:

ashCompilerPluginEnable := false // default is true
ashCompilerPluginVerbose := true // default is false

This can be done for all compiler plugins, like above, or just one:

ashCodecRegistrationCheckerCompilerPlugin / ashCompilerPluginEnable := false
ashDumpPersistenceSchemaCompilerPlugin / ashCompilerPluginVerbose := true

Additionally, Compile and Test scope can be specified:

Compile / ashDumpPersistenceSchemaCompilerPlugin / ashCompilerPluginVerbose := true
Test / ashCompilerPluginEnable := false

For full list of sbt keys, check org.virtuslab.psh.PekkoSerializationHelperKeys.

Example applications

The simplest example is the pekko-cluster application which uses Pekko Serialization Helper: pekko-cluster-app.

The second example is the pekko-persistence application which shows usage of the Dump Persistence Schema Compiler Plugin: pekko-persistence-app.

Step-by-step guide

See full step-by-step guide on Pekko Serialization Helper usage.

Contributing Guide

If you want to contribute to this project, see Contributing Guide.

Appendix A: Comparison of available Pekko Serializers

Serializer Jackson Circe Protobuf v3 Avro Borer Kryo
Data formats JSON or CBOR JSON JSON or custom binary JSON or custom binary JSON or CBOR custom binary
Scala support very poor, even with jackson-module-scala:
  • poor support for Scala objects, without configuration (without adding ScalaObjectDeserializerModule usage) creates new instances of singleton types (Foo$), breaking pattern matching
  • lacks support of basic scala types like Unit
  • without explicit annotation doesn't work with generics extending AnyVal
perfect out of the box perfect with ScalaPB perfect with Avro4s perfect out of the box perfect out of the box
Pekko support pekko-serialization-jackson serializer provided by this project used by pekko-remote internally requires custom serializer requires custom serializer pekko-kryo
Compile-time mechanics nothing happens in compile time; everything based on runtime reflection derives codecs via Shapeless with ScalaPB, generates Scala classes based on *.proto files with Avro4s, derives Avro schemas using Magnolia derives codecs without Magnolia with pekko-kryo, optionally derives codecs in compile time, but otherwise uses reflection in runtime
Runtime safety none, uses reflection encoders and decoders are created during compilation *.proto files are validated before compilation Avro schema is created during compilation encoders and decoders are created during compilation depends on whether codecs were derived in compile time (then standard for Scala code), or not (than none)
Boilerplate a lot:
  • ADTs requires amount of annotation equal to or exceeding the actual type definitions
  • requires explicit serializers and deserializers in certain cases (e.g. enums)
every top-level sealed trait must be registered manually - but see Codec Registration Checker in case of custom types, a second layer of models is needed sometimes requires annotations every top-level sealed trait must be registered manually; every transitively included class must have an explicitly defined codec every top-level sealed trait must be registered manually
Schema evolution
  • removing field
  • adding optional field
with JacksonMigration:
  • adding mandatory field
  • renaming field
  • renaming class
  • support of forward versioning for rolling updates
  • adding optional field
  • removing optional field
  • adding required field with default value
  • removing required field
  • renaming field
  • reordering fields
  • transforming data before deserialization
  • adding optional field
  • removing optional field
  • adding required field with default value
  • removing required field
  • renaming field
  • reordering fields
  • changing between compatible types
  • reordering fields
  • renaming fields
  • adding optional field
  • adding required field with default value
  • removing field with default value
  • renaming fields
  • transforming data before deserialization
  • adding field
  • removing field
  • renaming field
  • renaming class

Appendix B: what happens with serialization of Messages / Events / States that do not extend a base trait bound to a serializer

In runtime, if given Message/Event/State class does not extend a base trait bound to a serializer, the following problems will occur (depending on Pekko version and settings):

Note - the default setting for Pekko 2.5 is pekko.actor.allow-java-serialization=on, whereas pekko.actor.allow-java-serialization=off is the default setting for Pekko 2.6

Pekko version pekko.actor.allow-java-serialization=off pekko.actor.allow-java-serialization=on
2.5 Serialization will fail with logs like below:

[WARN] [...] Outgoing message attempted to use Java Serialization even though `pekko.actor.allow-java-serialization = off` was set! Message type was: [class sample.cluster.transformation.BackendRegistration$]

[ERROR] [...] Failed to serialize remote message [class pekko.actor.ActorSelectionMessage] using serializer [class pekko.remote.serialization.MessageContainerSerializer]. Transient association error (association remains live)

pekko.remote.MessageSerializer$SerializationException: Failed to serialize remote message
[class pekko.actor.ActorSelectionMessage] using serializer [class pekko.remote.serialization.MessageContainerSerializer].
at org.apache.pekko.remote.MessageSerializer$.serialize(MessageSerializer.scala:67)
at org.apache.pekko.remote.EndpointWriter.$anonfun$serializeMessage$1(Endpoint.scala:1021)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.pekko.remote.EndpointWriter.serializeMessage(Endpoint.scala:1021)
at org.apache.pekko.remote.EndpointWriter.writeSend(Endpoint.scala:887)
at org.apache.pekko.remote.EndpointWriter$$anonfun$4.applyOrElse(Endpoint.scala:859)
at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:539)
at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:537)
at org.apache.pekko.remote.EndpointActor.aroundReceive(Endpoint.scala:536)
at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:612)
at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:581)
at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:229)
at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:241)
at org.apache.pekko.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at org.apache.pekko.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at org.apache.pekko.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at org.apache.pekko.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: pekko.serialization.DisabledJavaSerializer$JavaSerializationException: Attempted to serialize message using Java serialization while `pekko.actor.allow-java-serialization` was disabled. Check WARNING logs for more details.
Serialization succeeds - but using Java serialization (which is not a good choice for production). Warning log like below will appear on the startup:

[WARN] [...] Using the default Java serializer for class [sample.cluster.transformation.TransformationResult] which is not recommended because of performance implications. Use another serializer or disable this warning using the setting 'pekko.actor.warn-about-java-serializer-usage'
2.6 Serialization will fail with logs like below:

[WARN] [...] - Outgoing message attempted to use Java Serialization even though`pekko.actor.allow-java-serialization = off` was set! Message type was: [class sample.cluster.transformation.Worker$TransformText]

[ERROR] [...] - Failed to serialize message [sample.cluster.transformation.Worker$TransformText]. pekko.serialization.DisabledJavaSerializer$JavaSerializationException: Attempted to serialize message using Java serialization while `pekko.actor.allow-java-serialization` was disabled. Check WARNING logs for more details
Serialization succeeds - but using Java serialization (which is not a good choice for production). Warning log like below will appear on the startup:

[WARN] [...] Using the default Java serializer for class [sample.cluster.transformation.TransformationResult] which is not recommended because of performance implications. Use another serializer or disable this warning using the setting 'pekko.actor.warn-about-java-serializer-usage'