/sauerkraut

A reimagined scala-pickling in the Scala 3 world

Primary LanguageScalaApache License 2.0Apache-2.0

Sauerkraut

The library for those cabbage lovers out there who want to send data over the wire.

A revitalization of Pickling in the Scala 3 world.

Usage

When defining over-the-wire messages, do this:

import sauerkraut.core.{Buildable,Writer,given}
case class MyMessage(field: String, data: Int)
  derives Buildable, Writer

Then, when you need to serialize, pick a format and go:

import format.json.{Json,given}
import sauerkraut.{pickle,read,write}

val out = StringWriter()
pickle(Json).to(out).write(MyMessage("test", 1))
println(out.toString())

val msg = pickle(Json).from(out.toString()).read[MyMessage]

Current Formats

Here's a feature matrix for each format:

Format Reader Writer All Types Evolution Friendly Notes
Json Yes Yes Yes Yes Uses Jawn for parsing
Protos Yes Yes Yes Yes Binary format evolution friendly format
NBT Yes Yes Yes For the kids.
XML Yes Yes Yes Inefficient prototype.
Pretty No Yes No For pretty-printing strings

See Compliance for more details on what this means.

Json

Everyone's favorite non-YAML web data transfer format! This uses Jawn under the covers for parsing, but can write Json without any dependencies.

Example:

import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.json.Json

case class MyWebData(value: Int, someStuff: Array[String])
    derives Buildable, Writer

def read(in: java.io.InputStream): MyWebData =
  pickle(Json).from(in).read[MyWebData]
def write(out: java.io.OutputStream): Unit = 
  pickle(Json).to(out).write(MyWebData(1214, Array("this", "is", "a", "test")))

sbt build:

libraryDependencies += "com.jsuereth.sauerkraut" %% "json" % "<version>"

See json project for more information.

Protos

A new encoding for protocol buffers within Scala! This supports a subset of all possible protocol buffer messages but allows full definition of the message format within your Scala code.

Example:

import sauerkraut.{pickle,write,read, Field}
import sauerkraut.core.{Writer, Buildable, given}
import sauerkraut.format.pb.{Proto,,given}


case class MyMessageData(value: Int @Field(3), someStuff: Array[String] @Field(2))
    derives Writer, Buildable

def write(out: java.io.OutputStream): Unit = 
  pickle(Proto).to(out).write(MyMessageData(1214, Array("this", "is", "a", "test")))

This example serializes to the equivalent of the following protocol buffer message:

message MyMessageData {
  int32 value = 3;
  repeated string someStuff = 2;
}

sbt build:

libraryDependencies += "com.jsuereth.sauerkraut" %% "pb" % "<version>"

See pb project for more information.

NBT

Named-Binary-Tags, a format popularized by Minecraft.

Example:

import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.nbt.Nbt

case class MyGameData(value: Int, someStuff: Array[String])
    derives Buildable, Writer

def read(in: java.io.InputStream): MyGameData =
  pickle(Nbt).from(in).read[MyGameData]
def write(out: java.io.OutputStream): Unit = 
  pickle(Nbt).to(out).write(MyGameData(1214, Array("this", "is", "a", "test")))

sbt build:

libraryDependencies += "com.jsuereth.sauerkraut" %% "nbt" % "<version>"

See nbt project for more information.

XML

Everyone's favorite markup language for data transfer!

Example:

import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.xml.{Xml, given}

case class MySlowWebData(value: Int, someStuff: Array[String])
    derives Buildable, Writer

def read(in: java.io.InputStream): MySlowWebData =
  pickle(Xml).from(in).read[MySlowWebData]
def write(out: java.io.Writer): Unit = 
  pickle(Xml).to(out).write(MySlowWebData(1214, Array("this", "is", "a", "test")))

sbt build:

libraryDependencies += "com.jsuereth.sauerkraut" %% "xml" % "<version>"

See xml project for more information.

Pretty

A format that is solely used to pretty-print object contents to strings. This does not have a [PickleReader] only a [PickleWriter].

Example:

import sauerkraut._, sauerkraut.core.{Writer,given}
case class MyAwesomeData(theBest: Int, theCoolest: String) derives Writer

scala> MyAwesomeData(1, "The Greatest").prettyPrint
val res0: String = Struct(rs$line$2.MyAwesomeData) {
  theBest: 1
  theCoolest: The Greatest
}

Design

We split Serialization into three layers:

  1. The source layer. It is expected these are some kind of stream.
  2. The Format layer. This is responsible for reading a raw source and converting into the component types used in the Shape layer. See PickleReader and PickleWriter.
  3. The Shape layer. This is responsible for turning Primitives, Structs, Choices and Collections into component types.

It's the circle of data:

   Source   =>     format    =>  shape => memory =>  shape  =>   format    =>   Destination        

[PickleData] => PickleReader => Builder[T] => T => Writer[T] => PickleWriter => [PickleData]

This, hopefully, means we can reuse a lot of logic betwen various formats with light loss to efficiency.

Note: This library is not measuring performance yet.

Shape layer

The Shape layer is responsible for extracting Scala types into known shapes that can be used for serialization. These shapes, current, are Collection, Structure and Primitive. Custom shapes can be created in terms of these three shapes.

The Shape layer defines these three classes:

  • sauerkraut.core.Writer[T]: Can translate a value into write* calls of Primitive, Structure or Collection.
  • sauerkraut.core.Builder[T]:
    Can accept an incomiing stream of collections/structures/primitives and build a value of T from them.
  • sauerkraut.core.Buildable[T]: Can provide a Builder[T] when asked.

Format layer

The format layer is responsible for mapping sauerkraut shapes (Collection, Structure, Primitive, Choice) into the underlying format. Not all shapes in sauerkraut will map exactly to underlying formats, and so each format may need to adjust/tweak incoming data as appropriate.

The format layer has these primary classes:

  • sauerkraut.format.PickleReader: Can load data and push it into a Builder of type T
  • sauerkraut.format.PickleWriter: Accepts pushed structures/collections/primitives and places it into a Pickle

Source Layer

The source layer is allowed to be any type that a format wishes to support. Inputs and outputs are provided to the API via these two classes:

  • sauerkraut.format.PickleReaderSupport[Input, Format]: A given of this instance will allow the PickleReader to be constructed from a type of input.
  • sauerkraut.format.PickleWriterSupport[Output,Format]: A given of this instance will allow PickleWriter to be constructed from a type of output.

This layer is designed to support any type of input and output, not just an in-memory store (like a Json Ast) or a streaming input. Formats can define what types of input/output (or execution environment) they allow.

Writing a new format.

New formats are expected to provide the "format" + "source" layer implementations they require.

TODO - a bit more here.

Differences from Scala Pickling

There are a few major differences from the old scala pickling project.

  • The core library is built for 100% static code generation. While we think that dynamic (i.e. runtime-reflection-based) pickling could be built using this library, it is a non-goal.
    • Users are expected to rely on typeclass derivation to generate Reader/Writers, rather than using macros
    • The supported types that can be pickled are limited to the same supported by typeclass derivation or that can have hand-written Writer[_]/Builder[_] instances.
  • Readers are no longer driven by the Scala type. Instead we use a new Buildable[A]/Builder[A} design to allow each PickleReader to push value into a Builder[A] that will then construct the scala class.
  • There have been no runtime performance optimisations around codegen. Those will come as we test the limits of Scala 3 / Dotty.
  • Format implementations are separate libraries.
  • The PickleWriter contract has been split into several types to avoid misuse. This places a heavier amount of lambdas in play, but may be offsite with optimisations in modern versions of Scala/JVM.
  • The name is more German.

Benchmarking

Benchmarking is still being built-out, and is pending the final design on Choice/Sum-Types within the Format/Shape layer.

You can see benchmark results via: benchmarks/jmh:run -rf csv.

Latest status/analysis can be found in the benchmarks directory.

Benchmarking TODOs

  • Basic comparison of all formats
  • Size-of-Pickle measurement
  • Well-thought out dataset for reading/writing
  • Isolated read vs. write testing
  • Comparison against other frameworks.
    • Protos vs. protocol buffer java implementation
    • Json Reading vs. raw JAWN to AST (measure overhead)
    • Jackson
    • Kryo
    • Thrift
    • Circe
    • uPickle
  • Automatic well-formatted graph dump in Markdown of results.

Thanks

Thanks to everyone who contributed to the original pickling library for inspiration, with a few callouts.

  • Heather Miller + Philipp Haller for the original idea, innovation and motivation for Scala.
  • Havoc Pennington + Eugene Yokota for helping define what's important when pickling a protocol and evolving that protocol.