opendistro-for-elasticsearch/data-prepper

Question: Why is data-prepper implemented as a separate component with-in ES?

Closed this issue · 2 comments

Any reason why this is not implemented as an exporter in https://github.com/open-telemetry/opentelemetry-collector-contrib?

/cc @alolita

Hey @BuddySpike, great question. Data Prepper was originally created for a few usecases not met by the OTel collector at the time:

  1. Stateful processing - DP batches groups of spans by trace ID in memory or on disk so that data can be transformed/enriched before reaching ES. Most of the current transforms are denormalization to be used in ES queries.
  2. Horizontal scaling - because some of the transforms require all spans for a given trace to be processed by the same host, DP can route spans to other DP hosts in a cluster. Whereas having a fleet of OTel collectors behind a load balancer would scatter the spans.

I do see that now there is a GroupByTraceProcessor and a LoadBalancingExporter in the OTel contrib package, so maybe using the collector is something we can re-evaluate at some point? Unfortunately these addons weren't present when the DP project was started, hopefully this background info helps

Thanks @wrijeff 🙏🏾. It makes sense now.