streamnative/pulsar-spark

[FEATURE] Expose Pulsar-Client Metrics with Prometheus

nlu90 opened this issue · 11 comments

nlu90 commented

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@nlu90, checking in on this; when we last spoke about this, you had indicated June as a potential month for this release. Do you still feel confident that is the case? Please let me know if there is anything we can do to help.

I spoke with Neng and the work on this will be delayed.

@mboyanna-sn @nlu90 can you please advise if this is still something that will be done given the change we talked about?

@martijngonlag for any feature requests like this, going forward please create issue in a feature-request repository instead.

@martijngonlag aah I see this is engineering lead feature that's why it's here - for engineering to track it down. I think the right thing to do was for it to have a counterpart in feature-request (@sara-hannigan could you help us get organized here to ensure there's the counterpart in feature-request repo).

@nlu90 given this is in pulsar-spark repo did you refer to this feature in engineering OKRs as the Spark metrics tracking?

@martijngonlag You can answer Yes to the customer, this is part of Q3, I just confirmed with Neng.

@mboyanna-sn Checking in to see if this is still scheduled for Q3

Key metrics I am interested in seeing (some are directly pulsar and some are more application oriented)

  1. How big a backlog there is (delta between the current "pointer" and the "head" of the topic)
  2. Any systemic failures e.g. errors connecting to Pulsar or the pointer no longer refers to a position in the topic (e.g. due to retention policy which is too low and when we recreated the subscription the data was garbage collected)
  3. Error counts / retries when processing

@frankjkelly Thanks for your comments. I was looking for a way to expose these metrics.

FYI we have ended our usage of spark with Pulsar so this is no longer a priority for us. Thanks!