apache/accumulo

Add Additional Compactor metrics

Closed this issue · 15 comments

Is your feature request related to a problem? Please describe.
With the addition of external compactors, there isn't a great way of identifying if a compactor is efficient or slow.

Describe the solution you'd like
Add metrics for bytes read and bytes written on a per-compactor basis.
This might need to be initially added at a per-iterator level, and then aggregated up to be reported at the compactor level.

This is expected to be emitted as data is being read and write as opposed to only being emitted when the compaction is complete.

K/V read and written will be easier than bytes I think. The FileCompactor already keeps track of entries read and written, they are just not emitted as metrics. Keeping track of bytes read and written at this level will likely not be accurate due to run-length encoding and compression.

METRICS_COMPACTOR_MAJC_STUCK is incremented in CompactionWatcher.run, which appears to be used in both internal and external compactions.

I can take a look at this unless someone else was planning to.

I can take a look at this unless someone else was planning to.

I think we should solidify whether we are trying to emit bytes or k/v entries before starting on anything. The answer to that question is going to directly impact the amount of work that needs to be done.

I can take a look at this unless someone else was planning to.

I think we should solidify whether we are trying to emit bytes or k/v entries before starting on anything. The answer to that question is going to directly impact the amount of work that needs to be done.

These metrics should help answer if a compaction strategy change and/or iterator change is positive or negative.
If we can get that information based on K/Vs then I think that's fine. It's at least a step in the right direction.

Do we track total entries in the tablet or the files for compaction? We use the bytes read/written in listcompactions output to get a feel for the percent complete. I can look at the files list and see the aggregate file size in the list and compare that to size read.

I know the monitor shows a percentage for external compactions, but would be good to be able to know not just how many it's written but also that 1) progress is being made and 2) with what speed

How does listcompactions get the data in bytes?

How does listcompactions get the data in bytes?

From what I'm seeing in listcompactions, the READ and WROTE columns are in K/V entries.

https://github.com/apache/accumulo/blob/2.1/shell/src/main/java/org/apache/accumulo/shell/commands/ActiveCompactionHelper.java#L96-L100

How does listcompactions get the data in bytes?

I think I see the point of confusion. The shortenCount method uses a suffix (K, M, B) that is very similar to bytes (Kb, Mb, ...).

@DomGarguilo - I think CompactionWatcher.run is the right place to increment the metrics for K/V read and write. This is called, directly or indirectly, from the TabletServer and Compactor. I haven't looked at how best to wire them up, I'll leave that to you.

If we have 10 total compactors and only 7 of them are currently working on something, it may be nice if aggregations of the metrics only considered the 7 that were actually doing something. Not sure how best to do this though. Some of the micrometer Meters seem to not emit any data when its zero, but I don't have a good understanding of this. That may be helpful for this case.

If we have 10 total compactors and only 7 of them are currently working on something, it may be nice if aggregations of the metrics only considered the 7 that were actually doing something. Not sure how best to do this though. Some of the micrometer Meters seem to not emit any data when its zero, but I don't have a good understanding of this. That may be helpful for this case.

I think it depends on whether Micrometer is reporting absolutes, or rates. The value that gets output depends on the system the metric is being sent to.

I have started on some changes for this and wanted to make sure I'm on the right track.

I have added metrics that track the same read and write values that are logged in CompactionWatcher.run:

String message = String.format(
"Compaction in progress, read %d of %d input entries ( %s %s ), written %d entries, paused %d times",
info.getEntriesRead(), inputEntries, percentComplete, "%",
info.getEntriesWritten(), info.getTimesPaused());

These values have always been zero though. Both in the logs and the metrics that use these values.

I have started on some changes for this and wanted to make sure I'm on the right track.

I have added metrics that track the same read and write values that are logged in CompactionWatcher.run:

String message = String.format(
"Compaction in progress, read %d of %d input entries ( %s %s ), written %d entries, paused %d times",
info.getEntriesRead(), inputEntries, percentComplete, "%",
info.getEntriesWritten(), info.getTimesPaused());

These values have always been zero though. Both in the logs and the metrics that use these values.

How are you testing this? Are you running a compaction on a table via uno or something? Is the compaction lasting several minutes?

I made the following change, and saw non-zero output.

diff --git a/test/src/main/java/org/apache/accumulo/test/compaction/ExternalCompactionProgressIT.java b/test/src/main/java/org/apache/accumulo/test/compaction/ExternalCompactionProgressIT.java
index abdf66b727..59f6e784bd 100644
--- a/test/src/main/java/org/apache/accumulo/test/compaction/ExternalCompactionProgressIT.java
+++ b/test/src/main/java/org/apache/accumulo/test/compaction/ExternalCompactionProgressIT.java
@@ -129,6 +129,7 @@ public class ExternalCompactionProgressIT extends AccumuloClusterHarness {
     var ecMap = ecList.getCompactions();
     if (ecMap != null) {
       ecMap.forEach((ecid, ec) -> {
+        ec.getUpdates().forEach((timestamp, update) -> System.out.println(update.toString()));
         // returns null if it's a new mapping
         RunningCompactionInfo rci = new RunningCompactionInfo(ec);
         RunningCompactionInfo previousRci = runningMap.put(ecid, rci);

I have started on some changes for this and wanted to make sure I'm on the right track.
I have added metrics that track the same read and write values that are logged in CompactionWatcher.run:

String message = String.format(
"Compaction in progress, read %d of %d input entries ( %s %s ), written %d entries, paused %d times",
info.getEntriesRead(), inputEntries, percentComplete, "%",
info.getEntriesWritten(), info.getTimesPaused());

These values have always been zero though. Both in the logs and the metrics that use these values.

How are you testing this? Are you running a compaction on a table via uno or something? Is the compaction lasting several minutes?

Yes a several minute compaction via uno.

I made the following change, and saw non-zero output.

I also see the -1 values printed when I add that line to the test.

Addressed in #4572