
This repo is a test repo on how to annotate binary schemas(in Avro, protobuf) with metadata at field level

This is an experimental repo on

  1. how to annotate binary schemas(protobuf) at field or message level [by using custom options]
  2. how to parse protobuf schemas [message and field-level attributes + options]


  1. Install protoc or buf(https://docs.buf.build/installation)

Approach 1

Parse proto source files directly using wire-schema API


  1. Run WireParser.java


  1. Ability to directly parse the plain-text proto file
  2. Can extract comments also


  1. Wire API's ProtoParser doesn't automatically load/merge imported schemas

Approach 2

Use protobuf API's FileDescriptorSet to parse protobuf file and its dependencies


  1. Generate java bindings for proto file with definition of custom options

    protoc --proto_path=src/main/resources/proto --java_out=src/main/java  --experimental_allow_proto3_optional src/main/resources/proto/playground/options/v1/business_term_options.proto


    buf generate --config '{"version":"v1beta1","build":{"roots":["src/main/resources/proto"]}}' --template '{"version":"v1beta1","plugins":[{"name":"java","out":"src/main/java"}]}' --path src/main/resources/proto/playground/options/v1/business_term_options.proto
  2. Compile protobuf files to generate a compiled binary descriptor file(contains a FileDescriptorSet (a protocol buffer, defined in descriptor.proto))

    protoc --proto_path=src/main/resources/proto --descriptor_set_out=src/main/resources/protoc-bin/message_sample.desc --include_imports src/main/resources/proto/playground/v1/message_sample.proto


    buf build --config '{"version":"v1beta1","build":{"roots":["src/main/resources/proto"]}}' --exclude-source-info -o src/main/resources/buf-bin/message_sample.desc --path src/main/resources/proto/playground/v1/message_sample.proto
  3. Run DescriptorFileParser.java


  1. Parser automatically loads imported schemas


  1. Extra pre-processing step: compile proto files into a descriptor file
  2. compile-time dependency of Java bindings for custom options (to populate extensionRegistry)
  3. Unable to parse comments

Other Approaches:

  1. Dymamically load java bindings
    • [Not considered] as it uses reflection
  2. Parse using Confluent-Kafka Schema Registry API (schema can be read from file or schema registry)
    • [Not considered] as
      1. doesn't support options
      2. it creates heavy coupling with confluent schema-registry