akoutmos/prom_ex

[BUG] Polling of metrics in custom plugins stops if an error is raised inside the mfa function for the metric

Closed this issue · 3 comments

Describe the bug
Polling of metrics in custom plugins stops if an error is raised inside the mfa function for the metric.

To Reproduce
Steps to reproduce the behavior:

  1. Clone this example repository: https://github.com/fedme/prom_ex_issue
    The sample application defines a custom PromEx plugin here: https://github.com/fedme/prom_ex_issue/blob/main/lib/prom_ex_issue/custom_prom_ex_plugin.ex

  2. Start the sample application with mix phx.server and look at the logs in the terminal

  3. Observe the logs showing that the mfa function for the metric is called at every polling interval, you should see the following output:

  ######################################################################
MFA execute_ping_metrics called for the 1 time.
######################################################################

######################################################################
MFA execute_ping_metrics called for the 2 time.
######################################################################

[...]
  1. The plugin is written so that the mfa function raises an error the 6th time it is polled, you should see something like the following output in the console:
[...]

######################################################################
MFA execute_ping_metrics called for the 4 time.
######################################################################

######################################################################
MFA execute_ping_metrics called for the 5 time.
######################################################################

[error] Error when calling MFA defined by measurement: PromExIssue.CustomPromExPlugin :execute_ping_metrics [#PID<0.676.0>]
Class=:error
Reason=%RuntimeError{
  message: "Something is not working correctly, I can't return the metrics right now!"
}
Stacktrace=[
  {PromExIssue.CustomPromExPlugin, :execute_ping_metrics, 1,
   [
     file: ~c"lib/prom_ex_issue/custom_prom_ex_plugin.ex",
     line: 48,
     error_info: %{module: Exception}
   ]},
  {:telemetry_poller, :make_measurement, 1,
   [
     file: ~c"/Users/fedme/code/prom_ex_issue/deps/telemetry_poller/src/telemetry_poller.erl",
     line: 336
   ]},
  {:telemetry_poller, :"-make_measurements_and_filter_misbehaving/1-lc$^0/1-0-",
   1,
   [
     file: ~c"/Users/fedme/code/prom_ex_issue/deps/telemetry_poller/src/telemetry_poller.erl",
     line: 332
   ]},
  {:telemetry_poller, :handle_info, 2,
   [
     file: ~c"/Users/fedme/code/prom_ex_issue/deps/telemetry_poller/src/telemetry_poller.erl",
     line: 354
   ]},
  {:gen_server, :try_handle_info, 3, [file: ~c"gen_server.erl", line: 1095]},
  {:gen_server, :handle_msg, 6, [file: ~c"gen_server.erl", line: 1183]},
  {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 241]}
]
  1. Notice that the metric is not polled anymore after the exception, no more logs in the console alerting us that the function is being polled.

Expected behavior
Even if the metric function raises an error at a certain poll invocation, the polling should not stop and rather keep going so that future values of the metric can be collected after the error (hopefully) goes away.

Environment

Erlang/OTP 26 [erts-14.2.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.16.0 (compiled with Erlang/OTP 26)

Additional context
First raised on Slack.

Thanks for the detailed issue! I should be able to knock this out over the weekend. I'm currently working on another open source library....so I should be able to tackle this as well given I am in open source mode 😄

Thanks for the repro project. I was able to incorporate my additions to the Polling metric type in your repo and avoided having the MFA detached (specifically the detach_on_error: false option):

defmodule PromExIssue.CustomPromExPlugin do
   ...

    @impl true
    def polling_metrics(opts) do
      poll_rate = Keyword.get(opts, :poll_rate, @default_poll_rate)
      debug_agent = opts[:debug_agent]

    Polling.build(
      :custom_prom_ex_plugin_ping_metrics,
      poll_rate,
      {__MODULE__, :execute_ping_metrics, [debug_agent]},
      [
        last_value(
          [:custom, :prom_ex, :plugin, :metrics],
          event_name: @ping_event_name,
          measurement: :count,
          description: "Ping for debugging",
          tags: [:state]
        )
      ],
      detach_on_error: false
    )
  end

  ...
end

Should be cutting a release soon with these changes!

Closing this ticket for now as a release will be cut soon.