Add extensibility to SBE XSD for custom tools

Question

Add extensibility to SBE XSD for custom tools

alexeq opened this issue 6 years ago · 3 comments

I'd like to discuss some possible ways to add extensibility to SBE schema.

SBE schema has its main purpose to define the protocol "on-the-wire". In addition it contains some additional features (namely 'package' attribute, 'semanticType' attribute, custom types for identifiers) for tools that use the schema to generate code, documentation, etc. What should developer do if provided means are not enough?

I see some possible ways to impement that:

Create new schema by extending SBE schema and extending SBE elements with custom elements and attributes (requires XSD proficiency, might be cumbersome - didn't research this approach)
Add external configuration file that refers to types, messages, fields defined in SBE XML file by name (e.g. "message.NewOrderSingle.javaInterface=com.my.sbe.MyBaseEncoderDecoder", "message.NewOrderSingle.docSection=Order Events" - I've seen some production projects using this approach to extend SBE)
Add wildcard elements to SBE XSD, allowing to extend it "in-place" (see below).

I suggest that we allow for some limited extensibility in SBE XSD by adding generic elements ('xs:anyAttribute', 'xs:any') to SBE XSD. Check https://www.ibm.com/developerworks/library/x-xtendschema/index.html, https://www.xml.com/pub/a/2004/10/27/extend.html, and
https://stackoverflow.com/questions/3347822/validating-xml-with-xsds-but-still-allow-extensibility discussion for additional links and info.

Adding 'xs:anyAttribute' with namespace="##other" would allow developers to extend SBE with custom attributes, e.g.:

<messageSchema xmlns="http://fixprotocol.io/2017/sbe" xmlns:my="http://mysbe.impl.com" ...>
    <!-- skipped -->
    <message name="NewOrderSingle" id="99" blockLength="54" semanticType="D"
        my:javaInterface="com.my.sbe.MyBaseEncoderDecoder" my:docSection="Order Events">
        <!-- skipped -->
    </message>
</messageSchema>

More research is need to explore what 'xs:any' might provide for extensibility (whether it's needed and whether it is backward-compatible with real-logic SBE implementation).

Summarizing, I think it would be great if SBE XSD would move to what http://www.xfront.com/ExtensibleContentModels.html defines as 'open content schema': one XML file to provide both "on-the-wire" SBE format using SBE XSD and "in-place" information for all other tools (code generators, documentation formatters, etc).

Answer 1 · 2020-05-14T15:05:06.000Z

To determine the best solution, we should consider the possible needs for extensibility. Some that come to mind:

Mapping SBE encodings to application layer semantics and behavior or workflow.
Translations from/to other encoding protocols or data structures.

Other thoughts?

Answer 2 · 2020-05-19T20:52:56.000Z

My main concern for extensibility (as I tried to explain above) was:

Possibility to add information for transformation tools (code/doc generators)

So it is probably covered by your list.

I am not sure how to formulate the need, but let's consider issue #120 for adding metadata to SBE schema. If SBE were allowing extra content, then we could just add DublinCore elements, similar to the example in https://www.dublincore.org/specifications/dublin-core/dc-xml-guidelines/2003-04-02/:

<messageSchema xmlns="http://fixprotocol.io/2017/sbe" 
    xmlns:dc="http://purl.org/dc/elements/1.1/" ...>
  <dc:title>Best SBE Schema</dc:title>
  <dc:description>Ultimate solution to all messaging protocols</dc:description>
  <dc:publisher>Me</dc:publisher>

  <!-- skipped -->
</messageSchema>

Answer 3 · 2021-04-22T07:26:15.000Z

+1 for extensibility along the lines suggested by @alexeq

I think all of the three points you brought up are actually all the same. The possibility to add payloads/custom fields that are processed by build tools during the build stage for doc or codegen, could also be used to define application level semantics or help define translations to/from other protocols and data structures.

At the end of the day, it's basically all the same - little assumptions should probably be made on how such data is used anyway...