delta-io/connectors

[Flink connector][Bug] Delta Sink - Metric name collision while creating DeltaWriterBucket metrics

kristoffSC opened this issue · 1 comments

Problem description
The DeltaWriterBucket that is created by DeltaWriter::getOrCreateBucketForBucketId method reports two metrics, DeltaSinkRecordsWritten and DeltaSinkBytesWritten.

For common scenarios, where we can have many DeltaWriterBucket created by the same DeltaWriter (many partitions assigned to the same writer), we will have a metric name collision while trying to register those metrics. Due to this collision, every DeltaWriterBucket in scope of DeltaWritter will use same metric object.

11:05:59,261 WARN o.a.f.r.m.g.AbstractMetricGroup [] - Name collision: Group already contains a Metric with the name 'DeltaSinkRecordsWritten'. Metric will not be reported.[, taskmanager, 82522c6c-ab64-4fce-8a57-186ccd869e5f, Flink Streaming Job, Sink: Writer, 0]
11:06:12,016 WARN o.a.f.r.m.g.AbstractMetricGroup [] - Name collision: Group already contains a Metric with the name 'DeltaSinkBytesWritten'. Metric will not be reported.[, taskmanager, 82522c6c-ab64-4fce-8a57-186ccd869e5f, Flink Streaming Job, Sink: Writer, 0]

What are the implications
The are no negative implications of this issue other than warning in logs.
Even though DeltaWriterBucket created by the same DeltaWriter will share the same 'SimpleCounter()' to track the metric value,
which is not a thread safe object, this DOES NOT cause any issues since every DeltaWriterBucket in scope of DeltaWriter will be executed by the same thread.

What is the reason
Metrics are registered in 'MetricGrouppassed fromDeltaWriterthat is a common object shared across allDeltaWriterBucket'screated from thatDeltaWriterinstance. Since metric names are constant fields defined in DeltaWriterBucket class, every nextDeltaWriterBucketcreated by particularDeltaWriter` instance will try to register a metric with the same name.

What we can do
If we want to keep one metric for all DeltaWriterBucket per DeltaWriter then in order to avoid name collision we should keep instance of metric in DeltaWriter and pass it to DeltaWriterBucket.

If we would like to have separate metrics per DeltaWriterBucket, then we need to pass some unique id that can be appended to metric name in DeltaWriterBucket . If we would like to keep posftix constant betwean retart we would need to add it to DeltaWriterBucketState.
Having separate metrics per DeltaWriterBucket might be not the best idea since the number of those will be equal to number of unique partition values, which can be very big number.

This repo has been deprecated and the code is moved under connectors module in https://github.com/delta-io/delta repository. Please create the issue in repository https://github.com/delta-io/delta. See #556 for details.