serilog/serilog-sinks-opentelemetry

Discussion: features required for initial production release

loomis opened this issue ยท 14 comments

The company I work for has been using the development version of the OpenTelemetrySink in selected micro-services to test functionality and performance of the sink in production settings. To date, no blocking issues have been found in recent versions of the sink. Consequently, the company would like to make a larger, general rollout of the sink to most of our micro-services. To do this, a production release is needed with a stable API.

To judge the remaining work and have an idea of the possible release date, I'd like to discuss what features are required for the initial, MVP release, including any modifications needed to support the Serilog release process.

Input from everyone is welcome, but I'd like to hear specifically from @nblumhardt what gates/controls need to be passed before a production release can be made.

I'm proposing to include #58. The company's general rollout will require modifying an internal Serilog-wrapping library, which would be simplified with this change.

Hi @loomis, thanks for the ping! I think the sink's coming together really well so I can't see anything really blocking a 1.0. Just a process to work through, at this stage.

Here are the things I think we should consider:

  • Completeness - there are still a few things to flush out around capturing of different data types, I think this PR covers some of them, but there's more to go. Opened #60 for this one, keen to know if there are others. I don't think we should stretch this goal to more complex feature additions (such as environment variable support) as the sink's usable without them, and they seem orthogonal/fair game for a 2.0.
  • Correctness - I'm not sure how to gauge this except with some wider use; perhaps the best thing to do here is mark a version as "RC" and put the word out that people should try it and report bugs/omissions?
  • API stability & ease of non-breaking updates/improvements - it's really hard to get the complete set of configuration options nailed down in the first cut; adding new options when they're binary breaking changes can lead to a lot of cruft. Added #61. Interested in more thoughts on this too.
  • Data model stability - the OTLP Logs spec is very young; I assume we've covered the most important semantic attributes with things like exception.* but worth a quick check over - anything missing here or subject to change?
  • Compatibility with multiple back-ends - one commenter mentioned that they were using this sink with Splunk; I've been testing it against an internal pre-release of Seq; which back-end are you targeting, Cal? Would be good to have some confirmation that the sink works reasonably well with the most likely/popular back-ends. An "RC" would help cover this, too, I think.
  • Maintainability - Serilog sinks can end up in massively wide use, which poses some challenges
      1. for some technologies the proportion of users vs maintainers is massive and it can be hard to keep up. Are we good with supporting a 1.0 for this (yet?) - probably no good way to answer this but with some measure of engagement/enthusiasm, which an RC again might help to gauge.
      1. scope and breaking changes - the most successful sinks over the long haul have easily-maintained/minimal surface area, and narrow scope. I think we should set out to review the public API surface and set up tests around this. Raised #62.

Just my quick Saturday morning brain dump, thanks for kicking off the discussion @loomis!

Added #64 for the data model stability. Both the binary and JSON protobuf definitions are marked as stable in v0.19.0.

I'll review the current semantic conventions for attributes. I think that we're following the primary ones. However, there's going to be a merger/evolution of the semantic conventions with the Elastic Common Schema. I'll look to see if there are changes needed here. Given that these don't affect binary compatibility and that we know there will be a long transition here, I wouldn't want to hold up a 1.0 release for clarity here. Created #65 for this.

The company I work for sends its log telemetry to a custom OpenTelemetry Collector via the gRPC and HTTP protobuf protocols. These data eventually flow to New Relic, via an OTLP/gRPC endpoint. Our use has not uncovered any performance or compatibility issues. (Compatibility issues are unlikely given that protobuf binding on both the client and server are generated from the same protobuf definition.)

Definitely interesting to watch the ECS merger; I suspect that whatever we choose here it'll be "wrong" on some level within a couple of years; that seems well within the kind of evolution that a 2.0 (or 3.0...) release can accommodate at the right time.

Hoping to keep the ball rolling - perhaps we aim to hit our RC by mid-May (and let whoever might be interested know about it), and allocate one month from there to bake in feedback for a 1.0 in mid-June?

@nblumhardt I'm happy with 1.0 RC in mid-May and 1.0 Final in mid-June. So far, there are only a few things to change, so I think that it achievable.

There's an open question on what to do with the exception attributes. I'll tag you on that in the schema ticket.

There is a blocking problem with the command line example not working (#69), but all other changes that have been discussed have been made.

Are there any other features/API changes that are required before a 1.0-RC release?

I left a comment on #68 that it'd be great to get your thoughts on.

Also raised #70 .

I think with those two considered and the collector example sorted, we should call it RC ๐Ÿ‘

I'll send a PR with the two items mentioned above, now :)

Changes are OK for me. Merged #70. I'll look at #69 to try to understand what's going on; at the moment, I'm rather stumped.

@nblumhardt Is the pre-release version of Seq with OpenTelemetry support available? If so, can it be run within a container? That would be a good alternate test/example for validation.

I can also validate directly with New Relic, since it supports a native OTLP endpoint.

@loomis the Seq pre-release is still "in the workshop" so not available on Docker Hub just yet, though we might get as far as a preview ahead of the 1.0 for this package (fingers crossed). I'm keeping a close eye on it so should be able to spot any issues pretty quickly.

I think we've hit the RC goal - perhaps we now close this, open a draft 1.0 release PR with tentative release date stamped into it. I'll get onto it :-) ๐ŸŽ‰