trivago/gollum

Remove routing from formatters

arnecls opened this issue · 4 comments

Currently formatters are able to modify the stream of a message.
With the changes imposed by #80 and #82 we could remove that functionality again.

Thoughts so far:

  • If a formatter currently does "routing and stripping" we could put the stripped part to the key and then route based on the key in a router
  • Possible problem: Preserving the key?
  • It makes sense that routers to routing, not that much that formatters do it
  • In that regard the different component take over the following purpose
    • consumer: converting messages into a common exchange format
    • router: change origin based on payload / metrics
    • producer: convert message into a specific service format

Let's consider a few pipelines we use at trivago.

  1. PHP writes logs to a socket consumer, adding a stream prefix, denoting the log name.
    So a message might be fancyErrorLog:My fancy error message.
  • With 0.4.x a formatter extracts the first part, sets the stream and removes it from the message.
  • With 0.5.x a formatter could extract the first part and set it as metadata. A route could then use the metadata for routing. This is IMO a lot cleaner.
  1. A kafka producer listens to "*" and applies a rate limiter. After rate limiting the stream name is stored as a kafka key and all messages are routed to the same stream. That stream is mapped to a single topic.
  • With 0.5.x the stream name would already be a metadata key which can directly be used as a kafka key. The merging to a single stream is not necessary as you can map "*" to a single topic. If multiple maps are required, multiple kafka producers can be used.

When removing the stream changing capabilities from the formatter we lose the possibility to change the stream on the producer level. In terms of consistency this actually is a good thing. In terms of flexibility it forces us to use metadata, which is fine, too, but might introduce more changes in the future.

If we have a "fan out" scenario as in example 2, moving all routing capabilities to the routers will introduce more memory overhead as we need a router object per stream. The computational overhead should be roughly the same (assuming that cache access is not an issue).

-> we need to profile the changes. From an architectural perspective this change makes a lot of sense and IMO is worth it.

#121 and this task are probably coupled as the ability to change the stream could be a separate interface.

Won't do. Same reasoning as #121.
While this is a good idea from an architectural perspective, the performance implications and structural complexity increase are just too big.