google/cloudprober

Surfacing metrics faster than 10s

Closed this issue · 2 comments

Describe the feature you'd like and the problem it will solve
I am diagnosing intermittent packetloss on an ethernet point to point link, and came across cloudprober which can send UDP and ICMP echo probes at a reasonably high rate. However the prometheus surfacer updates at a rate of 10s currently.
I saw sysvars_interval_msec which does increase the rate of sysvars, but it does not update the prometheus surfacer at a higher rate.

Description of what you want to happen and what problem will it solve for you.
I am looking to export /metrics at 1Hz so I can better diagnose this ethernet link going down intermittently (using a linkstate protocol called BFD, I do know that the loss of service on the link is larger than the BFD timeout which is currently at 2.0s). I would like to present the ethernet link provider with a clear graph that shows latency and loss at a per-second granularity.

Hello @pimvanpelt!

There is a probe level config option to control the frequency at which metrics are exported:

optional int32 stats_export_interval_msec = 13;

So your config will look like the following:

probe {
  name: ...
  type: .. 
  
  # Run probe every 200 ms, with 100 ms timeout, and export results very 500ms.
  interval_msec: 200
  timeout_msec: 100 
  stats_export_interval_msec: 500 

  ping_probe {
  }
}

I think it should do what you're looking for.

Thanks @manugarg exactly what I was looking to achieve. Fixed a tiny broken window along the way, #512.