Metrics are not reset after consumer restart
aikoven opened this issue · 4 comments
In my app each consumer gets unique generated client_id
. When I restart all consumers, Kafka assigns partitions to their new client_id
s, but kafka_consumergroup_group_lag
metrics for old client_id
s remain in the output until I restart kafka-lag-exporter
.
The graph below shows stacked values of kafka_consumergroup_group_lag
for each client_id
on the first start, after restart of consumers, and then after restart of kafka-lag-exporter
:
The prometheus query is:
sum(kafka_consumergroup_group_lag) by (client_id)
Thanks for trying out the project @aikoven. The client_id
is generated by Kafka clients (by default) and is used to differentiate different Kafka clients. You probably want to aggregate on the group
label, which is the same group.id
you specify in your Kafka consumer properties configuration.
In this case, I would get a single series for the whole group. But I'd like to monitor the lag for each consumer in the group separately.
The problem is — even if a consumer is no longer assigned a partition, its metrics still show up.
I see. Yes, this is also a problem for old consumer groups that no longer exist. When a metric is set for a particular set of labels it will remain on the prometheus endpoint until it's explicitly unset. The fix would be for the exporter to either reset all metrics each reporting interval, or determine how to selectively unset metrics that are no longer valid according to information retrieved from the consumer group coordinator.