/pronto

Clojure support for protocol buffers

Primary LanguageClojure

pronto

Coverage Status Clojars Project cljdoc badge

A library for using Protocol Buffers 3 in Clojure.

Rationale

The guiding principles behind pronto are:

  • Idiomatic interaction: Use Protocol Buffer POJOs (protoc generated) as though they were native Clojure data structures, allowing for data-driven programming.
  • Minimalistic: pronto is behavioral only: it is only concerned with making POJOs mimic Clojure collections. Data is still stored in the POJOs, and no kind of reflective/dynamic APIs are used. This also has the benefit that unknown fields are not lost during serialization.
  • Runtime Type Safety: The schema cannot be broken - pronto fails-fast when associng a key not present in the schema or a value of the wrong type. This guarantees that schema errors are detected immediately rather than at some undefined time in the future (perhaps too late) or worse -- dropped and ignored completely.
  • Performant: Present a minimal CPU/memory overhead: pronto compiles very thin wrapper classes and avoids reflection completely.

Installation

Add a dependency to your project.clj file:

       [com.appsflyer/pronto "3.0.0"]

Note that the library comes with no Java protobuf dependencies of its own and they are expected to be provided by consumers of the library, with a minimal version of 3.15.0.

How does it work?

The main abstraction in pronto is the proto-map, a type of map which can be used as a regular Clojure map, but rejects any operations which would break its schema. The library generates a bespoke proto-map class for every protoc-generated Java class (POJO).

Every proto-map

  • Holds an underlying instance of the actual POJO.
  • Can be used as Clojure maps and support most Clojure semantics and abstractions by implementing all the appropriate internal Clojure interfaces (see fine print).
  • Is immutable, i.e, associng creates a new proto-map instance around a new POJO instance.

Quick example

Let's use this example:

(import 'protogen.generated.People$Person)

(require '[pronto.core :as p])

(p/defmapper my-mapper [People$Person])

defmapper is a macro which generates new proto-map classes for the supplied class and for any message type dependency it has. It also defines a var at the call-site, which serves as a handle to interact with the library later on.

Now we can work with protobuf while writing idiomatic Clojure code:

(-> (p/proto-map mapper People$Person) ;; create a Person proto-map
    (assoc :name "Rich" :id 0 :pet_names ["FOO" "BAR"])
    (update :pet_names #(map clojure.string/lower-case %))
    (assoc-in [:address :street] "Broadway"))

Internally, field reads and writes are delegated directly to the underlying POJO. For example, (:name person-map) will call Person.getName and (assoc person-map :name "John") will call Person.Builder.setName.

Schema-breaking operations will fail:

(assoc person-map :no-such-key 12345)
=> Execution error (IllegalArgumentException) at user.People$PersonMap/assoc
No such field :no-such-key

(assoc person-map :name 12345)
=> Execution error:  {:error :invalid-type,
                      :class protogen.generated.People$Person,
                      :field "name",
                      :expected-type java.lang.String,
                      :actual-type java.lang.Long,
                      :value 12345}

Fine print - please read

It is important to realize that while proto-mapss look and feel like Clojure maps for the most part, their semantics are not always identical. Clojure maps are dynamic and open; Protocol-buffers are static and closed. This leads to several design decisions, where we usually preferred to stick to Protocol-buffers' semantics rather than Clojure's. This is done in order to remove ambiguity, and because we assume that a protocol-buffers user would like to ensure the properties for which they decided to use them in the first place are maintained.

The main differences and the reasoning behind them are as follows:

  • A proto-map contains the entire set of keys defined in a schema (as Clojure keywords) -- the schema is the source of truth and it is always present in its entirety.
  • dissoc is unsupported -- for the reason above.
  • Trying to get a key not in the map will throw an error, rather than return nil, for two reasons; First, proto-maps are closed and can't be used as a general-purpose container of key-value pairs. Therefor, this is probably a mistake and we'd like to give the user immediate feedback. Second, returning nil could lead to strange ambiguities -- see below.
  • Associng a key not present in the schema is an error -- maintain schema correctness.
  • Associng a value of the wrong type is an error -- maintain schema correctness.
  • To nil or not to nil: protocol buffers in Java have no notion of scalar nullability. Scalar fields are always initialized and present. When unset, they take on their "zero-value" rather than null. However, for message type fields it is possible to check whether set or not.
    • Scalar fields will never be nil. When unset, their value will be whatever the default value is for the respective type. However, protobuf provides boxed wrappers for primitive types, which pronto automatically recognizes and inlines into the proto-map.
    • The value message type fields will be nil when they are unset. Associng nil to a message type field will clear it.

Usage guide

Creating a new map:

(import 'protogen.generated.People$Person)

(require '[pronto.core :as p])

(p/defmapper my-mapper People$Person)

;; Create a new empty Person proto-map:
(p/proto-map my-mapper People$Person)

;; Serialize a byte array into a proto-map (and accompanying POJO):
(p/proto-map->bytes my-proto-map)

;; Deserialize byte array into a proto-map (and accompanying POJO):
(p/bytes->proto-map my-mapper People$Person (read-person-byte-array-from-kafka))

;; Generate a new proto-map from a Clojure map adhering to the schema:
(p/clj-map->proto-map my-mapper People$Person {:id 0 :name "hello" :address {:city "London"}})

;; Wrap around an existing instance of a POJO:
(let [person (. (People$Person/newBuilder) build)]
  (p/proto->proto-map my-mapper person))
  
;; Get the underlying POJO of a proto-map:
(p/proto-map->proto my-proto-map)

Pro tip: On proto-maps scope

When creating data you can control when exactly you stop working with maps and start working with proto-maps. A proto-map has the advantage of failing fast. Hence associng an invalid field (wrong type, non-existent enum etc.) generates failures at the crime scene. This is a good thing since you want to locate the bug quickly. However, this comes with the cost of creating proto-maps.

(defn person-with-address [city]
  (let [addr (p/clj-map->proto-map my-mapper People$Address {:city city})]
    (p/clj-map->proto-map my-mapper People$Person {:id 0 :name "hello" :address addr})))

is mouthful. While it fails for every mistake at the right place deeply nested structures creation quickly becomes bloated this way.

However, this is also a valid code:

(defn person-with-address [city]
  (->> {:id 0 :name "hello" :address {:city city}}
       (p/clj-map->proto-map my-mapper People$Person))

It has the downside that you might have gotten either Person or Address wrong, but figuring which one is still easy enough. The point to move from plain maps into proto-maps can be chosen freely and should balance this tradeoff.

Protocol Buffers - Clojure interop

Fields

As discussed previously, a proto-map contains the entire set of keys defined in a schema, represented by a keyword of their original field name in the .proto file.

However, you can control the naming strategy of keys. For example, if you'd like to use kebab-case:

(require '[pronto.utils :as u])
(p/defmapper my-mapper People$Person 
    :key-name-fn u/->kebab-case)

Scalar fields

Scalar fields are straight-forward in that that they follow the protobuf Java scalar mappings.

Clojure-specific numeric types such as Ratio and BigInt are supported as well, and when associng them to a map they are converted automatically to the underlying field's type.

It is also important to note that Clojure uses longs to represent natural numbers, and these will be down-casted to int for integer fields.

In any case, handling of overflows is left to the user.

Message types

When calling defmapper, the macro will also find all message types on which the class depends, and generate specialized wrapper types for them as well, so you do not have to call defmapper recursively yourselves.

When reading a field whose type is a message type, a proto-map is returned.

It is possible to assoc both a proto-map into a message type field, or a regular Clojure map -- as long as it adheres to the schema.

Repeated and maps

Values of repeated/map fields are returned as Clojure maps/vectors:

(:pet_names person-map)
=> ["foo" "bar"]
(:relations person-map)
=> {"friend" {:name "Joe" ... } "cousin" {:name "Vinny" ... }}

Enums

Enumerations are also represented by a keyword:

(import 'protogen.generated.People$Like)
(p/defmapper my-mapper People$Like)

(:level (p/proto-map my-mapper People$Like)) ;; either Level/LOW, Level/MEDIUM, Level/HIGH
=> :LOW

It is possible to use kebab-case (or any other case) for enums.

(p/defmapper my-mapper People$Like
    :enum-value-fn u/->kebab-case)
(:level (p/proto-map my-mapper People$Like))
=> :low

Either a keyword or a Java enum value may be assoced:

(assoc (p/proto-map mapper People$Like) :level :HIGH)

(assoc (p/proto-map mapper People$Like) :level People$Level/HIGH)

One-of's

one-of's behave like other fields. This means that even when unset, the optional fields still exist in the schema with their default values or nil in the case of message types.

To check which one-of is set, use which-one-of or one-of.

For example, given this schema:

message Address {
  string city = 1;
  string street = 2;
  int32 house_num = 3;
  oneof home {
    House house = 4;
    Apartment apartment = 5;
  }
}
(p/which-one-of (p/proto-map People$Address) :home)
=> nil

(p/one-of (p/proto-map People$Address) :home)
=> nil

(p/which-one-of (p/clj-map->proto-map People$Address {:house {:num_rooms 3}}) :home)
=> :house

(p/one-of (p/clj-map->proto-map People$Address {:house {:num_rooms 3}}) :home)
=> {:num_rooms 3}

ByteStrings

ByteStrings are not wrapped, and returned raw in order to provide direct access to the byte array.

However, ByteString's are naturally seqable since they implement java.lang.Iterable.

Well-Known-Types

Well known types fields will be inlined into the message. This means that rather than calling (-> my-proto-map :my-string-value :value) you can simply write (:my-string-value my-proto-map). Note that since well-known-types are message types, this may return nil when the field is unset -- allowing us to model schemas which support null scalar fields.

Encoders

While protobuf allows us to describe our domain model, the Java generated types are not always a great programmatic fit. Consider the following schema:

message UUID {
   int64 msb = 1; // most significat bits
   int64 lsb = 2; // least significat bits
}

message Person {
   UUID id = 1;
}

Reading a person's id field would return a {:lsb <lsb> :msb <msb>} proto-map.

Encoders allow us to define an alternative type (rather than the POJO class) that will be used for proto-map fields of that type:

(defmapper mapper [protogen.generated.People$Person]
  :encoders {protogen.generated.People$UUID
             {:from-proto (fn [^protogen.generated.People$UUID proto-uuid]
                            (java.util.UUID. (.getMsb proto-uuid) (.getLsb proto-uuid)))                           
              :to-proto   (fn [^java.util.UUID java-uuid]
                            (let [b (People$UUID/newBuilder)]
                              (.setMsb b (.getMostSignificantBits java-uuid)
                              (.setLsb b (.getLeastSignificantBits java-uuid))
                              (.build b))))}})

(proto-map mapper People$Person :id (java.util.UUID/randomUUID))
=> {:id #uuid "2a1ef325-c7c2-42d4-815d-6bb1b9ed2e63"} 

This encourages DRYer code, since these kinds of proto<->clj conversions can be defined as a single encoder, rather than handled across the codebase.

Interoping proto-maps with Java code

It is sometimes necessary to interop with Java code that expects a POJO instance. For example, consider the following method signature:

public class Utils {
  public static void foo(com.google.protobuf.Duration duration) { ... }   
}  

This method receives a com.google.protobuf.Duration, a generated class that was compiled from the duration schema that is part of the protobuf distribution.

Since proto-maps are thin wrappers, we can always refer back to the underlying POJO and interop successfully:

(require '[pronto.core :as p])
(import 'com.google.protobuf.Duration)

(p/defmapper m [Duration])

(Utils/foo (p/proto-map->proto (p/proto-map m Duration)))

If your Java code operates on the protoc generated interfaces rather than concrete typs, it is also possible to pass the proto-map directly:

public static void foo(com.google.protobuf.DurationOrBuilder duration) { ... }
(Utils/foo (p/proto-map m Duration))

Please read the performance introduction.

Schema utils

To inspect a schema at the REPL use pronto.schema/schema, which returns the (Clojurified) schema as data:

(require '[pronto.schema :refer [schema]])

(schema People$Person)
=> {:diet #{"UNKNOWN_DIET" "OMNIVORE" "VEGETARIAN" "VEGAN"} ;; an enum
    :address People$Address ;; address field
    :address_book {String People$PersonDetails} ;; a map string->PersonDetails
    :age  int
    :friends [People$Person] ;; a repeated Person fields
    :name String}

Drilling-down is also possible:

(p/schema People$Person :address)
=> {:country String :city String :house_num int}

Please note that unlike the rest of the library, schema uses runtime reflection and is meant as a convenience method to be used during development.