MeltanoLabs/Singer-Working-Group

Proposal: Layered Model of Singer

dmosorast opened this issue ยท 5 comments

When we talk about Singer, it's important to be sure that we're talking about the same things. One way to accomplish that is to organize around language. This proposal is to define a known set of "buckets" that we can use to help us as a group organize what is a Spec change, and what is a standard. What follows is my gut feeling of how these divisions could work, and is by no means completely correct or complete. ๐Ÿ˜„

Without further ado, I'd like to propose...

The Layered Model of Singer

Looking at Singer, there are a lot of design choices baked in around a core value of simplicity. The reasoning for this has always been to give developers the freedom and flexibility to make it what they want, since all data sources are vastly different, and one cannot effectively design for all future cases in the ELT space.

As we discuss evolving Singer as a whole and as a community, it will be important to take care to not lose the core value of simplicity that has allowed the space for best practices to be invented like those encoded in the current existing frameworks/libraries.

Approaching the stack as a layered model can give us a means of aligning where an idea fits, and a tool iterate organically to "upgrade" concepts from a framework feature to a codified standard to a spec change if it makes sense.

Layer 1: Specification

This is the current specification as it stands, some principles of features here:

  • Language agnostic and implementation independent
  • Focus on the std-out portions of using Singer (serialization format, message types, required keys for messages, etc.)

Layer 2: Standards and Best Practices

These are being tracked in #10, but as far as the initial design decisions of Singer go, this conceptually includes things like Command-Line Arguments, Catalog, Metadata Keys/Custom Metadata, Standard State Keys, etc.

Some principles here:

  • These are also language agnostic and implementation independent
  • They help standardize the nitty-gritty to make writing frameworks and libraries more easy
  • They strive to make code more portable, readable, and usable for users and devs alike.

Layer 3: LIbraries and Frameworks

This is where we get into the language specific stuff. Libraries like singer-python and/or singer-clojure or frameworks like the MeltanoSDK take the standards plus best practices and encode them in a way that makes sense for the patterns of each language. This is also a good place to be a test bed for things that might become standards.

Principles:

  • Language specific
  • Generic use cases
  • These influence the way that code is written for their specific language

Layer 4: Tooling/Orchestration/UX/Infrastructure

I'm not quite sure about this one, but these are things that don't seem to fit in the other layers, and kind of make up an analog to the "Application Layer" of the OSI layered model. This layer is included to be a spot to hold things that are in use on a specific industry, use case, deployment method, etc., but not quite ready to be standardized.

[Aside:] This layer could use the most work, but it seemed worth including here. My gut says that it's likely harder to standardize these kinds of things, since it'll be where our orgs' respective product offerings fall into a lot of the time, and with that comes IP concerns, specifics for our target users (e.g., technical vs. non-technical), a specific slice of the industry, and/or a more narrow set of use cases. That said, tools like singer-discover would also fall here, and fit into a standardization conversation more easily.


That's what I currently have been kicking around for this idea, and am excited to get it out there for feedback, very curious about thoughts on the specific categories as well as whether this approach is a good idea. I'd like to eventually get this defined enough to make it into a SIP to officially propose a model. Thanks for checking it out! I appreciate all feedback ๐Ÿš€

@dmosorast thanks for putting this together. This would be worth putting into a blog post or formal document somewhere!

I really like this framing and this articulates how we've thought about Singer overall. We've always taken the approach when working on the Meltano SDK that any tap and target written with it should always be able to fall back to the purest form of the spec and should work on the command line with the pipe operator sending data.

We've also had a lot of discussions that anything that can flow "downward" to a lower layer absolutely should. So instead of keeping something specific to Meltano or an orchestration framework, if it's appropriate to put in the spec, then that's where it should go.

I'd love for this to exist as a doc on either singer.io or some other neutral website!

@dmosorast - I wonder what you think of something between "Spec" and "Standards and Best Practices" - where taps and targets can advertise certain capabilities and best practices they have implemented. The easy example is for a tap to advertise whether it can run in discovery mode (--discover) - but there are other examples, like the failsafe for data types (#20) and supporting/tolerating ACTIVATE_VERSION messages in #9. These are not strictly required by the spec, but we want to allow a type of "capability discovery" (#8) so that we can programmatically depend on those behaviors at runtime.

I feel like there's a subset of best practice behaviors we want to promote for the Singer community, and then also importantly we want to let taps and targets declare that they adhere to them to that orchestrators and their paired tap or target can rely on that behavior. These could be declared in repo metadata, in the repo's README.md, or in something like the proposed --about as discussed in #8. By codifying those best practices or optional behaviors, even if they are not strictly "required" to meet the spec, they then live somewhere more official than "best practices" and not entirely "required by spec". I think they become "optional extensions" or "optional capabilities".

What do you think of us breaking "Spec" into two tiers - "Required" and "Optional" behaviors - and then discuss further about declaring/detecting the optional behaviors in #8? And perhaps also, we as a Working Group may occasionally make proposals to promote "best practices" (recommended but not part of spec) to "optional capabilities" (part of spec, but not strictly required).

Thoughts?

We'll continue to iterate here, but I've added a "which layers" prompt to the SIP template: 4b5e835

Which layer(s) of the Singer ecosystem does this proposal directly touch?

Select all that apply:

  • Singer Specification - required capabilities and behaviors
  • Singer Specification - optional capabilities and behaviors
  • Singer best practices and other guidance
  • Singer Working Group - practices and procedures
  • Singer documentation (Other)

I've left off "Libraries and Frameworks" (probably external to working group) as well as "Tooling/Orchestration/UX/Infrastructure". When it applies to community guidance and/or best practices overall, I think general documentation for both of these layers could be bucketed under "Singer documentation (Other)" or "Singer best practices and other guidance".

That makes sense. I have some reservations about the topic of advertising capabilities, since that implies that there will be enough of them that layers need to be built on top to abstract over it and wrangle the whole thing. As I stated up here, simplicity has been a pretty big part of Singer in general, and some of these capabilities definitely feel more related to the orchestration rather than a standard.

That said, we have done a lot of work to make --discover a required standard on our end for tap submission to Stitch, so the concept of a split makes sense to me. I would put the split into Tier 2, however, since spec should be reserved for only the over-the-wire specification. Things like catalog, discoverable/non-discoverable metadata, and --discover would be part of the required standards.

That way it'd be like:
[ ] Singer Specification (Over-The-Wire protocol)
[ ] Singer Standards (Required)
[ ] Singer Best Practices (Optional)
[ ] Singer Working Group - practices and procedures
[ ] Singer documentation (Other)

I wonder about the "other" and/or whether it should just be documentation, but that's a different topic ๐Ÿ˜