bf2fc6cc711aee1a0c2a/kas-fleet-manager

Misleading metric name kafka_topic:kafka_topic_partitions:count

k-wall opened this issue · 9 comments

"kafka_topic:kafka_topic_partitions:count": {

		"kafka_topic:kafka_topic_partitions:count": {
			Name:           "kafka_topic:kafka_topic_partitions:count",
			Help:           KafkaTopicPartitionsCountDesc,
			Type:           prometheus.GaugeValue,
			TypeName:       "GAUGE",
			VariableLabels: []string{},
		},

The metric name kafka_topic:kafka_topic_partitions:count is wrong. What is being exposed is a count of the number of topics in the instance, not the number of topic_partitions. Sure, underneath we are deriving this value from a expression using the metric kafka_topic_partitions, but that's implementation detail. It does not belong in the API.

I'd also argue that kafka_topic:kafka_topic_partitions:sum would be more naturally called kafka_topic:kafka_topic_partitions:count because it is the count of the number of partitions. Again the fact that it is derived from a sum expression it irrelevant to the API.

The metric descriptions are correct.

pb82 commented

@k-wall I'm trying to understand this metric better. The result when querying kafka_topic_partitions{topic!~"^__redhat_.*"} is e.g.

kafka_topic_partitions{topic="__consumer_offsets"} 50

(most labels omitted for brevity) Do i read this as the topic __consumer_offsets has 50 partitions?

In that case yes, kafka_topic:kafka_topic_partitions:count is actually the count of topics, not the partitions.

@k-wall did @pb82 description above answer to your question? If yes, can issue be closed?

@k-wall I'm trying to understand this metric better. The result when querying kafka_topic_partitions{topic!~"^__redhat_.*"} is e.g.

kafka_topic_partitions{topic="__consumer_offsets"} 50

(most labels omitted for brevity) Do i read this as the topic __consumer_offsets has 50 partitions?

It is saying that topic __consumer_offsets has 50 partitions.

In that case yes, kafka_topic:kafka_topic_partitions:count is actually the count of topics, not the partitions.

Yeah, that is what it is. It you renamed the recording rule to kafka_topic:kafka_topic:count it would be logical.

To my second point, kafka_topic:kafka_topic_partitions:sum is a actually count of topic_partitions. So following your naming convention it should be called kafka_topic:kafka_topic_partitions:count . I think the word sum just happens to refer to the method by which you are calculating the value. It doesn't belong in the name.

@akoserwal you were looking at metrics name epic, is that correct? if so, will what you were looking at deal with cases like the one presented in this issue? I guess there is an opportunity of centralizing the metrics naming effort under that epic.

@machi1990, yes, thanks. I'll follow up

I suppose this is covered by bf2fc6cc711aee1a0c2a/architecture#64.
Is this correct @JameelB @k-wall ? If so, let's close the issue once that PR is merged.

yes that's correct, but that's only the proposal, it won't be fully resolved yet. I think we should only close this once we actually have the new metric place. wdyt?

Okay, that sounds good as well. Thanks @JameelB