The library for those cabbage lovers out there who want to send data over the wire.
A revitalization of Pickling in the Scala 3 world.
When defining over-the-wire messages, do this:
import sauerkraut.core.{Buildable,Writer,given}
case class MyMessage(field: String, data: Int)
derives Buildable, Writer
Then, when you need to serialize, pick a format and go:
import format.json.{Json,given}
import sauerkraut.{pickle,read,write}
val out = StringWriter()
pickle(Json).to(out).write(MyMessage("test", 1))
println(out.toString())
val msg = pickle(Json).from(out.toString()).read[MyMessage]
Here's a feature matrix for each format:
Format | Reader | Writer | All Types | Evolution Friendly | Notes |
---|---|---|---|---|---|
Json | Yes | Yes | Yes | Yes | Uses Jawn for parsing |
Protos | Yes | Yes | Yes | Yes | Binary format evolution friendly format |
NBT | Yes | Yes | Yes | For the kids. | |
XML | Yes | Yes | Yes | Inefficient prototype. | |
Pretty | No | Yes | No | For pretty-printing strings |
See Compliance for more details on what this means.
Everyone's favorite non-YAML web data transfer format! This uses Jawn under the covers for parsing, but can write Json without any dependencies.
Example:
import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.json.Json
case class MyWebData(value: Int, someStuff: Array[String])
derives Buildable, Writer
def read(in: java.io.InputStream): MyWebData =
pickle(Json).from(in).read[MyWebData]
def write(out: java.io.OutputStream): Unit =
pickle(Json).to(out).write(MyWebData(1214, Array("this", "is", "a", "test")))
sbt build:
libraryDependencies += "com.jsuereth.sauerkraut" %% "json" % "<version>"
See json project for more information.
A new encoding for protocol buffers within Scala! This supports a subset of all possible protocol buffer messages but allows full definition of the message format within your Scala code.
Example:
import sauerkraut.{pickle,write,read, Field}
import sauerkraut.core.{Writer, Buildable, given}
import sauerkraut.format.pb.{Proto,,given}
case class MyMessageData(value: Int @Field(3), someStuff: Array[String] @Field(2))
derives Writer, Buildable
def write(out: java.io.OutputStream): Unit =
pickle(Proto).to(out).write(MyMessageData(1214, Array("this", "is", "a", "test")))
This example serializes to the equivalent of the following protocol buffer message:
message MyMessageData {
int32 value = 3;
repeated string someStuff = 2;
}
sbt build:
libraryDependencies += "com.jsuereth.sauerkraut" %% "pb" % "<version>"
See pb project for more information.
Named-Binary-Tags, a format popularized by Minecraft.
Example:
import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.nbt.Nbt
case class MyGameData(value: Int, someStuff: Array[String])
derives Buildable, Writer
def read(in: java.io.InputStream): MyGameData =
pickle(Nbt).from(in).read[MyGameData]
def write(out: java.io.OutputStream): Unit =
pickle(Nbt).to(out).write(MyGameData(1214, Array("this", "is", "a", "test")))
sbt build:
libraryDependencies += "com.jsuereth.sauerkraut" %% "nbt" % "<version>"
See nbt project for more information.
Everyone's favorite markup language for data transfer!
Example:
import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.xml.{Xml, given}
case class MySlowWebData(value: Int, someStuff: Array[String])
derives Buildable, Writer
def read(in: java.io.InputStream): MySlowWebData =
pickle(Xml).from(in).read[MySlowWebData]
def write(out: java.io.Writer): Unit =
pickle(Xml).to(out).write(MySlowWebData(1214, Array("this", "is", "a", "test")))
sbt build:
libraryDependencies += "com.jsuereth.sauerkraut" %% "xml" % "<version>"
See xml project for more information.
A format that is solely used to pretty-print object contents to strings. This does not have a [PickleReader] only a [PickleWriter].
Example:
import sauerkraut._, sauerkraut.core.{Writer,given}
case class MyAwesomeData(theBest: Int, theCoolest: String) derives Writer
scala> MyAwesomeData(1, "The Greatest").prettyPrint
val res0: String = Struct(rs$line$2.MyAwesomeData) {
theBest: 1
theCoolest: The Greatest
}
We split Serialization into three layers:
- The
source
layer. It is expected these are some kind of stream. - The
Format
layer. This is responsible for reading a raw source and converting into the component types used in theShape
layer. SeePickleReader
andPickleWriter
. - The
Shape
layer. This is responsible for turning Primitives, Structs, Choices and Collections into component types.
It's the circle of data:
Source => format => shape => memory => shape => format => Destination
[PickleData] => PickleReader => Builder[T] => T => Writer[T] => PickleWriter => [PickleData]
This, hopefully, means we can reuse a lot of logic betwen various formats with light loss to efficiency.
Note: This library is not measuring performance yet.
The Shape layer is responsible for extracting Scala types into known shapes that can be used for
serialization. These shapes, current, are Collection
, Structure
and Primitive
. Custom
shapes can be created in terms of these three shapes.
The Shape layer defines these three classes:
sauerkraut.core.Writer[T]
: Can translate a value into write* calls of Primitive, Structure or Collection.sauerkraut.core.Builder[T]
:
Can accept an incomiing stream of collections/structures/primitives and build a value of T from them.sauerkraut.core.Buildable[T]
: Can provide aBuilder[T]
when asked.
The format layer is responsible for mapping sauerkraut shapes (Collection
, Structure
, Primitive
, Choice
) into
the underlying format. Not all shapes in sauerkraut will map exactly to underlying formats, and so each
format may need to adjust/tweak incoming data as appropriate.
The format layer has these primary classes:
sauerkraut.format.PickleReader
: Can load data and push it into a Builder of type Tsauerkraut.format.PickleWriter
: Accepts pushed structures/collections/primitives and places it into a Pickle
The source
layer is allowed to be any type that a format wishes to support. Inputs and outputs are
provided to the API via these two classes:
sauerkraut.format.PickleReaderSupport[Input, Format]
: A given of this instance will allow thePickleReader
to be constructed from a type of input.sauerkraut.format.PickleWriterSupport[Output,Format]
: A given of this instance will allowPickleWriter
to be constructed from a type of output.
This layer is designed to support any type of input and output, not just an in-memory store (like a Json Ast) or a streaming input. Formats can define what types of input/output (or execution environment) they allow.
New formats are expected to provide the "format" + "source" layer implementations they require.
TODO - a bit more here.
There are a few major differences from the old scala pickling project.
- The core library is built for 100% static code generation. While we think that dynamic (i.e. runtime-reflection-based)
pickling could be built using this library, it is a non-goal.
- Users are expected to rely on typeclass derivation to generate Reader/Writers, rather than using macros
- The supported types that can be pickled are limited to the same supported by typeclass derivation or that
can have hand-written
Writer[_]
/Builder[_]
instances.
- Readers are no longer driven by the Scala type. Instead we use a new
Buildable[A]
/Builder[A}
design to allow eachPickleReader
to push value into aBuilder[A]
that will then construct the scala class. - There have been no runtime performance optimisations around codegen. Those will come as we test the limits of Scala 3 / Dotty.
- Format implementations are separate libraries.
- The
PickleWriter
contract has been split into several types to avoid misuse. This places a heavier amount of lambdas in play, but may be offsite with optimisations in modern versions of Scala/JVM. - The name is more German.
Benchmarking is still being built-out, and is pending the final design on Choice/Sum-Types within the Format/Shape layer.
You can see benchmark results via: benchmarks/jmh:run -rf csv
.
Latest status/analysis can be found in the benchmarks directory.
- Basic comparison of all formats
- Size-of-Pickle measurement
- Well-thought out dataset for reading/writing
- Isolated read vs. write testing
- Comparison against other frameworks.
- Protos vs. protocol buffer java implementation
- Json Reading vs. raw JAWN to AST (measure overhead)
- Jackson
- Kryo
- Thrift
- Circe
- uPickle
- Automatic well-formatted graph dump in Markdown of results.
Thanks to everyone who contributed to the original pickling library for inspiration, with a few callouts.
- Heather Miller + Philipp Haller for the original idea, innovation and motivation for Scala.
- Havoc Pennington + Eugene Yokota for helping define what's important when pickling a protocol and evolving that protocol.